I love working with data and statistical programming, and I am constantly learning new skills and tooling in my spare time. My work has always been at the intersection of deep sector knowledge and data science.
I have experience working with a variety of stakeholders, collaborators, and developing data products for diverse audiences. I work mostly in R due to its huge number of libraries and emphasis on reproducible analysis.
I am a member of the rOpenSci Community and I am a certified Tidyverse and Shiny RStudio Instructor . Send me an email if you need training for your team, if you need an R package or dashboard for your organization, or if you have questions about my R packages.
Economiccomplexity: Computational Methods for Economic Complexity
Economic complexity introduces network theory concepts to different social science considerations related to international trade and income inequality. With the bulk of literature established in the last decade, the field of economic complexity is relatively new. Its approach starts from representing international trade data as a bipartite network that connects countriesto the products that they export.
Gravity: Estimation Methods for Gravity Models in R
Gravity models are used to explain bilateral flows related to the sizes of bilateral partners, a measure of distance between them and other influences on interaction costs. The underlying idea is rather simple. The greater the masses of two bodies and the smaller the distance between them, the stronger their attraction. For a state-of-the-art exposition about cross-sectional data see Wölwer, Breßlein, & Burgard (2018).
Community-driven translation of R for Data Science. I translated chapters 12, 13, 19, 20, 21, re-written all the diagrams in Inkscape with texts in Spanish and managed the Ciencia de Datos GitHub organization.
This book will help you easily build beautiful plots in Python using the powerful plotnine package, which has been adapted from the popular ggplot2 package in R. If you’d like to create highly customised plots, including replicating the styles of XKCD and fivethirtyeight, this is your book.
This book will help you master R plots the easy way. We have spent a long time creating R plots with different tools (base, lattice and ggplot2) during different academic and working positions. If you want to create highly customised plots in R, including replicating the styles of XKCD, The Economist or FiveThirtyEight, this is your book.
Where is my work being used?
News Visualization at the University of Miami, Prof. Alberto Cairo.
Meier, Max. Green Growth in the Technology Space–Regional Diversification Pathways in Europe. MS thesis. 2020.
Pérez-Hernández, Carla Carolina, et al. “Mapping the Green Product-Space in Mexico: From Capabilities to Green Opportunities.” Sustainability 13.2 (2021): 945.
Digital Ocean contributed images
RStudio: A pre-configured image with R 3.6, RStudio Server 1.2. and Shiny Server 1.5. All dependencies are solved for you to just go and use this droplet with already configured Tidyverse, and Shiny, so that in three clicks and no more than two minutes you’ll have your server running and ready to fit models and more.
RStudio + Stan: A pre-configured image with R 3.6, RStudio Server 1.2, rstan 2.28 and tidyverse 1.3. It was created to follow this tutorial by Dr. Andrew Heiss. The goal of this image is to ease Bayesian modeling (with Stan). This cutting edge research method is really computationally intensive, and scalable droplets help at implementing them efficiently.
RStudio + H2O: A pre-configured image with R 3.6, RStudio Server 1.2. and H2O 3.28. It was created to follow this tutorial by Dr. Erin LeDell but it can be used to train models with scalable droplets.
RStudio + PkgDev: A pre-configured image with R 3.6, RStudio Server 1.2. and common tools to build and test R packages (devtools, testthat, usethis, roxygen2, covr and git). It was created to follow this tutorial by Dr. Hadley Wickham but it can be used to test any R package and iterate back and forth with GitHub actions or other continuous integration.
Jitsi Server: Jitsi is an open source app for videoconferencing and chat. Works with Windows, Linux, Mac OS X and Android clients. Droplets created by using this image allow videoconferencing between Windows, Mac, Linux, Android and iOS users, just requiring to open a new browser tab on laptops/desktops or to install the Jitsi app from the Play Store/App Store on mobile.
I’ve decided to host different light/medium size by using PostgreSQL, MySQL and SQL Server backends (in strict descending order of preference!). Why 3 database backends? I think there are a ton of small edge cases when moving between DB back ends and so testing lots with live databases is quite valuable. With this resource you can benchmark speed, compression, and DDL types. This project has received funding from DigitalOcean Open Source Sponsorships.
An API configured to auto-update each week, which takes official government data and makes it tidy to ease using the information. The combination of CRON and R used at this project provides an API that serves datasets in CSV format, and it required to modify the plumber package. This project belongs to the Facultad de Matemáticas de la Universidad Católica de Chile.
Open Trade Statistics is an independent project that values reproducible research and provides tidy trade data. It provides data for the period 1962-2018 covering all countries that report to the United Nations, and it was created with the intention to lower the barrier to working with international economic trade data. It includes a public API, a dashboard, and an R package for data retrieval. This project has received funding from DigitalOcean Open Source Sponsorships.
Validation of Local and Remote Data Tables
With Rich Iannone. Validate data in data frames, tibble objects, Spark DataFrames, and database tables (e.g., PostgreSQL and MySQL). Validation pipelines can be made using easily-readable, consecutive validation steps. Upon execution of the validation plan, several reporting options are available. User-defined thresholds for failure rates allow for the determination of appropriate reporting actions. Many other workflows are available including an information management workflow, where the aim is to record, collect, and generate useful information on data tables.
A Test Environment for Database Requests
With Jonathan Keane. Testing and documenting code that communicates with remote databases can be painful. Although the interaction with R is usually relatively simple (e.g. data(frames) passed to and from a database), because they rely on a separate service and the data there, testing them can be difficult to set up, unsustainable in a continuous integration environment, or impossible without replicating an entire production cluster. This package addresses that by allowing you to make recordings from your database interactions and then play them back while testing (or in other contexts) all without needing to spin up or have access to the database your code would typically connect to.
Economic Outlook for Chile
Imports data from mindicador.cl API in data frame or time series format. The goal is to ease using certain economic data with different R packages, having in mind journalists and professionals that require information displayed in a clear and concise way.
Importa datos de la API de mindicador.cl en formato de cuadro de datos o serie de tiempo. El objetivo es facilitar el uso de algunos datos económicos a periodistas y profesionales que requieren información desplegada de forma clara y concisa.
Open Trade Statistics API Wrapper and Utility Program
Access Open Trade Statistics API from R to download international trade data. In addition provides functions and data to update flows by inflation, and includes official Harmonized System codes and names, with simplified names and color palettes to ease grouping of products and data visualization.
Maps of the Political and Administrative Divisions of Chile
With Ricardo Aravena (advisor). Terrestrial maps with simplified topologies. These maps lack geodesic precision, therefore DFL-83 1979 of the Republic of Chile applies and are considered to have no legal validity. Antartic territories are excluded and under no event these maps mean there is a cession or occupation of sovereign territories against International Laws from Chile. This package was intentionally documented in asciified spanish to make it work without problem on different platforms.
Mapas terrestres con topologías simplificadas. Estos mapas no tienen precision geodésica, por lo que aplica el DFL-83 de 1979 de la República de Chile y se consideran referenciales sin validez legal. No se incluyen los territorios antárticos y bajo ningún evento estos mapas significan que exista una cesión u ocupación de territorios soberanos en contra del Derecho Internacional por parte de Chile. Este paquete está documentado intencionalmente en castellano asciificado (sin acentos ni eñes) para que funcione sin problema en diferentes plataformas.
Efficient Fitting of Linear and Generalized Linear Models
With Constanza Prado (advisor), Yoto Yotov (committee), and Alexey Kravchenko (committee). Efficient Fitting of Linear and Generalized Linear Models by using just base R. As an alternative to lm() and glm(), this package provides elm() and eglm(), with a significant speedup when the number of observations is larger than the number of parameters to estimate, as it reduces the NxP model matrix to a PxP matrix. The best computational performance is obtained when R is linked against OpenBLAS, Intel MKL or other optimized BLAS library. This implementation aims at being compatible with broom and sandwich packages for summary statistics and clustering by providing S3 methods.
Estimation Methods with Probabilistic Stratified Sampling in CASEN Survey
Functions to compute descriptive and inferential statistics with CASEN Survey (Socio-Economic Characterization Survey) complex design. Includes datasets to harmonize commune codes that change across years and allows to convert to official SUBDERE codes.
Funciones para realizar estadística descriptiva e inferencia con el diseño complejo de la Encuesta CASEN (Encuesta de Caracterización Socio-Económica). Incluye datasets que permiten armonizar los códigos de comunas que cambian entre años y permite convertir a los códigos oficiales de SUBDERE. Este paquete está documentado intencionalmente en castellano asciificado (sin acentos ni eñes) para que funcione sin problema en diferentes plataformas.
Provides convenient access to more than 17 million records from the Chilean Census 2017 database. The datasets were imported from the official DVD provided by the Chilean National Bureau of Statistics by using the REDATAM converter created by Pablo De Grande.
Provee un acceso conveniente a mas de 17 millones de registros de la base de datos del Censo 2017. Los datos fueron importados desde el DVD oficial del INE usando el Convertidor REDATAM creado por Pablo De Grande. Esta paquete esta documentado intencionalmente en castellano asciificado para que funcione sin problema en diferentes plataformas.
2017 Chilean Census Easy Access Database
Provides a Spanish translated version of the datasets from the R packages listed above: nycflights13, Lahman, forcats, fueleconomy, datasets, tidyr. These datasets were used to translate R4DS.
Provee una versión traducida de los conjuntos de datos de los siguientes paquetes de R: nycflights13, Lahman, forcats, fueleconomy, datasets, tidyr. Estos conjuntos de datos se usaron para traducir R4DS
Automate Package and Project Setup
Automate package and project setup tasks that are otherwise performed manually. This includes setting up unit testing, test coverage, continuous integration, Git, ‘GitHub’, licenses, ‘Rcpp’, ‘RStudio’ projects, and more.
Addition of the AGPL-3 licensing option, changes made in #870 and #919. It makes sense for Shiny apps to have the option to use AGPL-3 license, as it enforces that you contribute modifications back to the community. See more details on licensing on AGPL-3 vs GPL-3 on Stackoverflow and MongoDB Blog.
An API Generator for R
Gives the ability to automatically generate and serve an HTTP API from R functions using the annotations in the R documentation around your functions.
Addition of CSV serialization, changes made in #520. This change allows to serve CSV data and/or JSON data by using the same REST service. It was made in order to ease data extraction by physicians and people working in public health who use the API COVID-19 Chile.
A High-Performance Database of Shipment-Level CITES Trade Data
Provides convenient access to over 40 years and 20 million records of endangered wildlife trade data from the Convention on International Trade in Endangered Species of Wild Fauna and Flora, stored on a local on-disk, out-of memory ‘DuckDB’ database for bulk analysis.
Evidence-based policymaking? A Reproducibility Approach Towards Civil Society Organizations’ Contributions, Toronto Data Workshop, Feb 2021. With Nicolas Didier. How should policymakers evaluate the evidence? Science reproducibility could enlighten the discussion about the quality of the evidence by providing a structured approach towards the source’s validity.
Dittodb: Simplificando las pruebas con bases de datos, Conecta R 2021 San José, Feb 2021. (In spanish) A general description of dittodb package and a quick review of continuous integration, GitHub actions and tests.
Analysing Trade and Trade Policy with the Structural Gravity Model, ARTNeT, United Nations ESCAP, Dec 2020. I seconded Dr. Yoto V. Yotov’s keynote with a coding session that covered structural gravity code in R, including remoteness and fixed effects estimates. The codes can be found in the unescap-gravity-2020 repository.
Open Trade Statistics, R Users Group Buenos Aires, Dec 2019. (In spanish) A general description of tradestatistics package, exercises with dplyr/dbplyr and examples with shiny and highcharter.
Open Trade Statistics, R Ladies Lima, Oct 2019. (In spanish) A general description of tradestatistics package, exercises with dplyr/dbplyr and examples with shiny and highcharter.
Open Trade Statistics, Latin R 2019 Santiago, Sep 2019. (In english) A general description of tradestatistics package, exercises with dplyr/dbplyr and examples with shiny and highcharter.
Creating an API REST with R, Latin R 2019 Santiago, Sep 2019. (In spanish) How to use plumber to fit regressions with large datasets. Also covers using DigitalOcean to run computations on a scalable virtual machine.
Creating an API REST with R, R Users Group Santiago, Apr 2019. (In spanish) How to use plumber to fit regressions with large datasets. Also covers using DigitalOcean to run computations on a scalable virtual machine.
Econometrics in R, satRday Santiago, Dec 2018. (In english.) Estimation of Gravity Models by using the gravity package. It also covers a quick review of linear models.
Demand Estimation in R, R Users Group Santiago, Sep 2018. (In english.) A quick review of time series and usage of forecast package. It also covers Taylor series.