Open source
C++ software
- Open REDATAM: Open Redatam is an open source
software for extracting raw information from REDATAM databases. It was created to recover information
of REDATAM databases for statistical analysis using standard tools such as SPSS, STATA, R, etc.
Selected R packages
- TabulaPDF: R bindings to the
Tabula Java library for PDF tables extraction.
- Redatam: Read REDATAM binary data directly
in R. This is similar to the Haven package for SPSS/Stata datasets but for REDATAM formats.
- Cpp11tesseract: R bindings to the
Tesseract C++ library for optical character recognition (OCR).
- Freedomhouse: A tidy version of Freedom House
datasets with added ISO country codes and texts with the sub-item justification.
- Capybara: Efficient
Fixed-Effects Estimation in R With C++ 11 Backend.
- Cpp11armadillo: Provides function declarations
and inline function definitions that facilitate communication between R and the Armadillo C++ library for
linear algebra and scientific computing.
- Cpp11eigen: Provides function declarations
and inline function definitions that facilitate communication between R and the Eigen C++ library for
linear algebra and scientific computing.
- Cpp11janitor: Provides function declarations and
inline function definitions that facilitate cleaning strings in C++ code before passing them to R.
- Cpp11tesseract: Bindings to Tesseract, a
powerful optical character recognition (OCR) engine that supports over 100 languages. The engine is
highly configurable in order to tune the detection algorithms and obtain the best possible results.
- Cpp11poppler: Bindings to Poppler, a tool
for extracting text, fonts, attachments and metadata from a PDF file. It also supports high quality
rendering of PDF documents into PNG, JPEG, TIFF format, or into raw bitmap vectors for posterior
processing.
- Cpp11qpdf: Bindings to Qpdf, an open-source
PDF rendering library that allows to conduct content-preserving transformations of PDF files such as
split, combine, and compress PDF files.
- DESTA (R version): The Design of Trade Agreements
Database.
- USITC Gravity: Database Adapted From the
International Trade and Production Database for Estimation (ITPD-E) and Dynamic Gravity Dataset (DGD).
- Gravity: Estimation methods for gravity models.
- Tradepolicy: Replication of an advanced guide
to
trade policy analysis.
- Tradestatistics: Open trade statistics API
wrapper and utility program.
- Pointblank: Data validation and organization of
metadata for local and remote tables.
- Redatam: Import REDATAM formats into R via the
Open REDATAM C++ library.
CRAN
I maintain the WebTechnologies view
and
the official mirror for Chile.
LaTeX
I created the R package varsityblues, which allows
you
to write assignments, presentations and thesis in RStudio by following UofT formatting. This allows the user
to
avoid copy-pasting tables or plots from R outputs into Word/LaTeX documents. Instead, it allows the user to
work
with notebooks that they can export to PDF, and it configures the LaTeX setup automatically.
This package is based on the LaTeX style files that were sent to me by the late Professor Kim C. Border.
Cloud computing
I created images for RStudio Server and RStudio Server + Kubernetes,
which
allows you to create virtual machines on DigitalOcean and to
skip
the setup time to install the Tidyverse and other packages. With these images, it takes around 30 seconds to
have a ready to go setup for Data Science. This combines well with analogsea as I explained in my blog.
In simple terms, this allows you to rent a supercomputer, for example, with 48 cores and 200 GB in RAM, for a
reasonable price and minimal waiting.