Chapter 2 Packages and data

2.1 Packages

Required packages for this workshop:

library(haven) # read dta format (Stata)
library(janitor) # tidy column names
library(dplyr) # chained operations
library(sandwich) # covariance based estimators
library(lmtest) # econometric tests
library(broom) # tidy regression results

2.2 Data

We can read directly from Stata files:

gravity <- clean_names(read_dta("data/gravity-data.dta"))

Now we need to prepare interval data:

gravity2 <- gravity %>% 
  filter(year %in% seq(1986, 2006, 4))

We are going to need to create and transform some variables that are needed later:

gravity2 <- gravity2 %>% 
  mutate(
    log_trade = log(trade),
    log_dist = log(dist)
  ) %>% 
  
  group_by(exporter, year) %>% 
  mutate(
    output = sum(trade),
    log_output = log(output)
  ) %>% 

  group_by(importer, year) %>% 
  mutate(
    expenditure = sum(trade),
    log_expenditure = log(expenditure)
  ) %>% 
  
  ungroup()

Before concluiding data preparation, we need to create pair ID and symmetric pair ID variables. IMPORTANT: Here we don’t need to create pair_id and symm_id as in Stata, the process is much simpler here (but other tasks will be harder!)

gravity2 <- gravity2 %>% 
  mutate(
    pair = paste(exporter, importer, sep = "_"),
    first = ifelse(exporter < importer, exporter, importer),
    second = ifelse(exporter < importer, importer, exporter),
    symm = paste(first, second, sep = "_")
  )