Reading REDATAM databases in R

R
Statistics
Linear models
Read REDATAM databases directly into R using the REDATAM Converter R package.
Author

Mauricio “Pachá” Vargas S.

Published

October 3, 2024

REDATAM

REDATAM (Retrieval of Data for Small Areas by Microcomputer) is a data storage and retrieval system created by ECLAC and it is widely used by national statistics offices to store and manipulate census and survey data.

However, conducting statistical analysis with REDATAM databases, such as Poisson or Negative Binomial regression, can be tricky due to their unique format that can be opened with an official point-and-click tool that allows to conduct counts and averages without additional features like SPSS, another point-and-click tool, that allows to test hypothesis and use a wide range of statistical functions.

REDATAM Converter

The REDATAM Converter is an open-source tool designed to extract raw information from REDATAM databases, used for census microdata. Whether you’re a statistician, researcher, or data analyst, this tool allows you to convert these databases into CSV files compatible with R, Python, Google Sheets, Microsoft Excel, and other data analysis tools.

Initially written in C# by Pablo de Grande, the REDATAM Converter has been fully rewritten in C++ for improved portability and efficiency. Now, with the release of an R package, the REDATAM Converter allows seamless integration of REDATAM databases directly into your R workflows.

REDATAM Converter R Package

The latest development in the REDATAM Converter is the release of an R package that lets users read REDATAM data directly into R. This can be a game-changer for researchers and analysts working in R, as it removes the need for exporting data to CSV files first with our converter, which requires command line usage. Instead, you can load and work with the data directly in R.

Key Features of the R Package:

  • Directly reads REDATAM databases into R using read_redatam().
  • Works with both .dic and .dicx formats.
  • Integrates seamlessly with other R packages such as dplyr, allowing for easy data manipulation and analysis.
  • Uses the Redatam Converter written in C++, which is fast and memory efficient.

To install the R package, you can run the following command in R:

remotes::install_github("pachadotdev/redatam-converter/rpkg", subdir = "rpkg")

Once installed, using the package is as simple as pointing it to your REDATAM dictionary. For example, to load data from the 2017 Chilean Census, you can unzip the file and run:

redatam::read_redatam("CP2017CHL/BaseOrg16/CPV2017-16.dicx")

For more detailed examples, check out the vignette included in the package documentation.