Econometrics in REstimation of Gravity ModelsPachásatRday SantiagoDec 15, 20181 / 46

Before we begin2 / 46

Acknowledgements

Thanks to:

Anna-Lenna Woelver and Jan Pablo Burgard fron Trier University (gravity 1st version)
Joshua Kunst from PUC Chile (gravity re-writting using rlang)
Vilma Romero (made this cool template available)

3 / 46

Contents of the talkLifegoals (or how to use R well)
A bit of statistical/economic theory
Our new R package gravity
4 / 46

Where to reach me

Twitter and Github: pachamaltese

Email: m vargas at dcc dot uchile dot cl

Phone: +1 XXX XXX XX XX or +56 X X XXX XX XX

5 / 46

Lifegoals

No matter if you are an expert or beginner user:

Don't use setwd()
Writing setwd("~/github/satrday-talk/file.csv) goes against reproducibility
Use RStudio projects instead

6 / 46

Lifegoals

No matter if you are an expert or beginner user:

Don't use rm(list = ls())
Restart the session instead

7 / 46

Lifegoals

If Jenny Bryan discovers you doing all of the previous, she will come into your office and set your computer on fire

8 / 46

LifegoalsPlease use Git (i.e. Github, Gitlab, Bitbucket, etc)
Commit to save your progress
If you screw up your code you can always go back
9 / 46

Lifegoals

If you work with others, using packrat is desirable
But it is really important that you write well (descriptive names + stylesheets)

# Don't do this
x=read.csv("data_2_use_this.csv");x=x[,-1];a=x$x2
# Do this
latam <- read_csv("latam_region_trade_data.csv") %>% select(-country_name)
iso_codes <- select(latam, country_iso)

ifyoudontwritewellitshardtobeunderstood

10 / 46

Lifegoals

Finally, finish your sandwich or Solenya will come with a laser beam

11 / 46

Questions to the audience

Do you understand (a bit) linear algebra?

Have you ever fitted a regression (in R) before today?

12 / 46

Linear regression

$\renewcommand{\vec}[1]{\boldsymbol{#1}} \newcommand{\R}{\mathbb{R}}$

Let $\vec{y} \in \R^n$ be the outcome and $X \in \R^{n\times p}$ be the design matrix in the context of a general model with intercept: $\vec{y} = X\vec{\beta} + \vec{e}$

Being:

$\begin{equation*} \underset{n\times 1}{\vec{y}} = \begin{pmatrix}y_0 \cr y_1 \cr \vdots \cr y_n\end{pmatrix} \text{ and } \underset{n\times p}{X} = \begin{pmatrix}1 & x_{11} & & x_{1p} \cr 1 & x_{21} & & x_{2p} \cr & \ddots & \cr 1 & x_{n1} & & x_{np}\end{pmatrix} = (\vec{1} \: \vec{x}_1 \: \ldots \: \vec{x}_p) \end{equation*}$

13 / 46

Linear regression

In linear models the aim is to minimize the error term by chosing $\hat{\vec{\beta}}$ . One possibility is to minimize the squared error by solving this optimization problem:

$\begin{equation} \label{min} \displaystyle \min_{\vec{\beta}} S = \|\vec{y} - X\vec{\beta}\|^2 \end{equation}$

Books such as Baltagi (2011) discuss how to solve this, and different equivalent approaches result in this optimal estimator:

$\begin{equation} \label{beta} \hat{\vec{\beta}} = (X^tX)^{-1} X^t\vec{y} \end{equation}$

With one independent variable and intercept, this is $y_i = \beta_0 + \beta_1 x_{i1} + e_i$ , $\vec{\beta}$ can be written as:

$\begin{equation} \label{beta2} \hat{\beta}_1 = cor(\vec{y},\vec{x}) \cdot \frac{sd(\vec{y})}{sd(\vec{x})} \text{ and } \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{\vec{x}} \end{equation}$

14 / 46

Coding example with mtcars dataset

Consider the model: $mpg_i = \beta_1 wt_i + \beta_2 cyl_i + e_i$

This is how to write that model in R notation:

lm(mpg ~ wt + cyl, data = mtcars)

Or written in matrix form:

y <- mtcars$mpg
x0 <- rep(1, length(y))
x1 <- mtcars$wt
x2 <- mtcars$cyl
X <- cbind(x0,x1,x2)

15 / 46

Coding example with mtcars dataset

It's the same to use lm or to perform a matrix multiplication:

fit <- lm(y ~ x1 + x2)
coefficients(fit)

## (Intercept)          x1          x2 
##   39.686261   -3.190972   -1.507795

# versus
beta <- solve(t(X)%*%X) %*% (t(X)%*%y)
beta

##         [,1]
## x0 39.686261
## x1 -3.190972
## x2 -1.507795

16 / 46

Coding example with Galton dataset

Now let's check the correlation procedure:

if (!require(pacman)) install.packages("pacman")
pacman::p_load(HistData)
y <- Galton$child
x <- Galton$parent
beta1 <- cor(y, x) *  sd(y) / sd(x)
beta0 <- mean(y) - beta1 * mean(x)
c(beta0, beta1)

## [1] 23.9415302  0.6462906

17 / 46

Simple gravity model18 / 46

Simple gravity model

The main reference for this section is Woelwer, Burgard, Kunst, and Vargas (2018) and the references therein.

Gravity models in their traditional form are inspired by Newton law of gravitation:

$\begin{equation*} F_{ij}=G\frac{M_{i}M_{j}}{D^{2}_{ij}}. \end{equation*}$

The force $F$ between two bodies $i$ and $j$ with $i \neq j$ is proportional to the masses $M$ of these bodies and inversely proportional to the square of their geographical distance $D$ . $G$ is a constant and as such of no major concern.

19 / 46

Simple gravity model

The underlying idea of a traditional gravity model, shown for international trade, is equally simple:

$\begin{equation*} X_{ij}=G\frac{Y_{i}^{\beta_{1}}Y_{j}^{\beta_{2}}}{D_{ij}^{\beta_{3}}}. \end{equation*}$

The trade flow $X$ is explained by $Y_{i}$ and $Y_{j}$ that are the masses of the exporting and importing country (e.g. the GDP) and $D_{ij}$ that is the distance between the countries.

This is also used to study urban policies and migration flows!

20 / 46

Simple gravity model

Dummy variables such as common borders $contig$ or regional trade agreements $rta$ can be added to the model. Let $t_{ij}$ be the transaction cost defined as:

$\begin{equation*} t_{ij}= D_{ij} \exp(contig_{ij} + rta_{ij}) \end{equation*}$

So that the model with friction becomes:

$\begin{equation*} X_{ij}=G\frac{Y_{i}^{\beta_{1}}Y_{j}^{\beta_{2}}}{t_{ij}^{\beta_{3}}}. \end{equation*}$

A logarithmic operator can be applied to form a log-linear model and use a standard estimation methods such as OLS:

$\begin{equation*} \log X_{ij}=\beta_{0}\log G +\beta_{1}\log Y_{i}+\beta_{2}\log Y_{j}+\beta_{3}\log D_{ij}+\beta_{4}contig_{ij}+\beta_{5}rta_{ij} \end{equation*}$

21 / 46

Trade barriers model22 / 46

Trade barriers model

Basically the model proposes that the exports $X_{ij}$ from $i$ to $j$ are determined by the supply factors in $i$ , $Y_{i}$ , and the demand factors in $j$ , $Y_{j}$ , as well as the transaction costs $t_{ij}$ .
Next to information on bilateral partners $i$ and $j$ , information on the rest of the world is included in the gravity model with $Y=\sum_{i} Y_{i}= \sum_{j} Y_{j}$ that represents the worldwide sum of incomes (e.g. the world's GDP).

23 / 46

Trade barriers model

A key assumption is to take a fixed value $\sigma > 1$ in order to account for the preference for a variation of goods (e.g. in this model goods can be replaced for other similar goods).
The Multilateral Resistance terms are included via the terms $P$ , Inward Multilateral Resistance, and $\Pi$ , Outward Multilateral Resistance.
The Inward Multilateral Resistance $P_i$ is a function of the transaction costs of $i$ to all trade partners $j$ .
The Outward Multilateral Resistance $\Pi_{j}$ is a function of the transaction costs of $j$ to all trade partners $i$ and their demand.
The Multilateral Resistance terms dependent on each other. Hence, the estimation of structural gravity models becomes complex.

24 / 46

Trade barriers model

The econometric literature proposes the Multilateral Resistance model defined by the equations:

$\begin{equation*} X_{ij}=\frac{Y_{i}Y_{j}}{Y}\frac{t_{ij}^{1-\sigma}}{P_{j}^{1-\sigma}\Pi_{i}^{1-\sigma}} \end{equation*}$ with $\begin{equation*} P_{i}^{1-\sigma}=\sum_{j}\frac{t_{ij}^{1-\sigma}}{\Pi_{j}^{1-\sigma}}\frac{Y_{j}}{Y};\:\Pi_{j}^{1-\sigma}=\sum_{i}\frac{t_{ij}^{1-\sigma}}{P_{i}^{1-\sigma}}\frac{Y_{i}}{Y} \end{equation*}$

25 / 46

Model estimation26 / 46

Model estimation

To estimate gravity equations you need a square dataset including bilateral flows defined by the argument dependent_variable, a distance measure defined by the argument distance that is the key regressor, and other potential influences (e.g. contiguity and common currency) given as a vector in additional_regressors are required.
Some estimation methods require ISO codes or similar of type character variables to compute particular country effects. Make sure the origin and destination codes are of type "character".

27 / 46

Model estimation

The rule of thumb for regressors or independent variables consists in:
- All dummy variables should be of type numeric (0/1).
- If an independent variable is defined as a ratio, it should be logged.
The user should perform some data cleaning beforehand to remove observations that contain entries that can distort estimates, notwithstanding the functions provided within gravity package will remove zero flows and distances.

28 / 46

Examples29 / 46

Double Demeaning

Double Demeaning subtracts importer and exporter averages on the left and right hand side of the respective gravity equation, and all unilateral influences including the Multilateral Resistance terms vanish.
Therefore, no unilateral variables may be added as independent variables for the estimation.

30 / 46

Double Demeaning

Our ddm function first logs the dependent variable and the distance variable.

Afterwards, the dependent and independent variables are transformed in the following way (exemplary shown for trade flows, $X_{ij}$ ): $\begin{equation*} (\log X_{ij})_{\text{DDM}} = (\log X_{ij}) - (\log X_{ij})_{\text{Origin Mean}} - (\log X_{ij})_{\text{Destination Mean}} + (\log X_{ij})_{\text{Mean}}. \end{equation*}$

One subtracts the mean value for the origin country and the mean value for the destination country and adds the overall mean value to the logged trade flows.

This procedure is repeated for all dependent and independent variables. The transformed variables are then used for the estimation.

31 / 46

Double Demeaning

An example of how to apply the function ddm to an example dataset in gravity and the resulting output is shown in the following:

pacman::p_load(gravity)
fit <- ddm(
    dependent_variable = "flow",
    distance = "distw",
    additional_regressors = c("rta", "comcur", "contig"),
    code_origin = "iso_o",
    code_destination = "iso_d",
    data = gravity_no_zeros
  )

32 / 46

Double Demeaning

The package returns lm or glm objects instead of summaries. Doing that allows to use our functions in conjunction with broom or other packages, for example:

pacman::p_load(dplyr, broom)
tidy(fit)

## # A tibble: 4 x 5
##   term         estimate std.error statistic  p.value
##   <chr>           <dbl>     <dbl>     <dbl>    <dbl>
## 1 dist_log_ddm   -1.60     0.0331    -48.4  0.      
## 2 rta_ddm         0.797    0.0700     11.4  6.54e-30
## 3 comcur_ddm      0.174    0.146       1.19 2.34e- 1
## 4 contig_ddm      1.00     0.120       8.36 6.62e-17

33 / 46

Double Demeaning

glance(fit) %>% select(matches("squared"))

## # A tibble: 1 x 2
##   r.squared adj.r.squared
##       <dbl>         <dbl>
## 1     0.254         0.254

34 / 46

Double Demeaning

How to do it without the function?

d <- gravity_no_zeros %>%
  mutate(
    dist_log = log(distw), # Transforming data, logging distances
    y_log = log(flow) # Transforming data, logging flows
  )

35 / 46

Double Demeaning

# Substracting the means
d <- d %>%
  mutate(
    y_log_ddm = y_log,
    dist_log_ddm = dist_log
  ) %>%
  group_by(iso_o, add = FALSE) %>%
  mutate(
    ym1 = mean(y_log_ddm, na.rm = TRUE),
    dm1 = mean(dist_log_ddm, na.rm = TRUE)
  ) %>% 
  group_by(iso_d, add = FALSE) %>%
  mutate(
    ym2 = mean(y_log_ddm, na.rm = TRUE),
    dm2 = mean(dist_log_ddm, na.rm = TRUE)
  )

36 / 46

Double Demeaning

# Substracting the means
d <- d %>%
  group_by(iso_o, add = FALSE) %>%
  mutate(
    y_log_ddm = y_log_ddm - ym1,
    dist_log_ddm = dist_log_ddm - dm1
  ) %>%
  group_by(iso_d, add = FALSE) %>%
  mutate(
    y_log_ddm = y_log_ddm - ym2,
    dist_log_ddm = dist_log_ddm - dm2
  )

37 / 46

Double Demeaning

# Substracting the means
d <- d %>%
  ungroup() %>%
  mutate(
    y_log_ddm = y_log_ddm + mean(y_log, na.rm = TRUE),
    dist_log_ddm = dist_log_ddm + mean(dist_log, na.rm = TRUE)
  )

38 / 46

Double Demeaning

# Substracting the means for the other independent variables
pacman::p_load(tidyr)
d2 <- d %>%
  select(iso_o, iso_d, rta, comcur, contig) %>%
  gather(key, value, -iso_o, -iso_d) %>%
  mutate(key = paste0(key, "_ddm")) %>%
  group_by(iso_o, key, add = FALSE) %>%
  mutate(ddm = value - mean(value, na.rm = TRUE)) %>%
  group_by(iso_d, key, add = FALSE) %>%
  mutate(ddm = ddm - mean(value, na.rm = TRUE)) %>%
  ungroup() %>%
  mutate(value = ddm + mean(value, na.rm = TRUE)) %>%
  select(iso_o, iso_d, key, value) %>%
  spread(key, value)

39 / 46

Double Demeaning

# Model
d <- left_join(d, d2, by = c("iso_o", "iso_d")) %>%
  select(y_log_ddm, ends_with("_ddm"))
vars <- paste(c("dist_log_ddm", paste0(c("rta", "comcur", "contig"), "_ddm"), 0), collapse = " + ")
form <- stats::as.formula(paste("y_log_ddm", "~", vars))
model_ddm <- stats::lm(form, data = d)
tidy(model_ddm)

## # A tibble: 4 x 5
##   term         estimate std.error statistic  p.value
##   <chr>           <dbl>     <dbl>     <dbl>    <dbl>
## 1 dist_log_ddm   -1.60     0.0331    -48.4  0.      
## 2 rta_ddm         0.797    0.0700     11.4  6.54e-30
## 3 comcur_ddm      0.174    0.146       1.19 2.34e- 1
## 4 contig_ddm      1.00     0.120       8.36 6.62e-17

40 / 46

Double Demeaning

Inside ddm() the argument robust=TRUE is equivalent to replace lm by rlm in the last part:

model_ddm <- MASS::rlm(form, data = d)
tidy(model_ddm)

## # A tibble: 4 x 4
##   term         estimate std.error statistic
##   <chr>           <dbl>     <dbl>     <dbl>
## 1 dist_log_ddm   -1.60     0.0311    -51.7 
## 2 rta_ddm         0.656    0.0657      9.99
## 3 comcur_ddm      0.507    0.137       3.70
## 4 contig_ddm      1.12     0.112       9.93

41 / 46

Double Demeaning

fit2 <- ddm(
    dependent_variable = "flow",
    distance = "distw",
    additional_regressors = c("rta", "comcur", "contig"),
    code_origin = "iso_o",
    code_destination = "iso_d",
    data = gravity_no_zeros,
    robust = TRUE
  )
tidy(fit2)

## # A tibble: 4 x 4
##   term         estimate std.error statistic
##   <chr>           <dbl>     <dbl>     <dbl>
## 1 dist_log_ddm   -1.60     0.0311    -51.7 
## 2 rta_ddm         0.656    0.0657      9.99
## 3 comcur_ddm      0.507    0.137       3.70
## 4 contig_ddm      1.12     0.112       9.93

42 / 46

Code and documentation

github.com/pachamaltese/gravity

pacha.hk/gravity

43 / 46

Questions?44 / 46

References

Baltagi, B. H. (2011). Econometrics. Springer Texts in Business and Economics 978-3-642-20059-5. Springer. DOI: 10.1007/978-1-4614-3169-5.

Woelwer, A. L, J. P. Burgard, J. Kunst, et al. (2018). "Gravity: Estimation Methods for Gravity Models in R". In: Journal of Open Source Software 31.3, p. 1038. DOI: 10.21105/joss.01038.

45 / 46

This work is licensed as
Creative Commons Attribution-NonCommercial 4.0 International

To view a copy of this license visit https://creativecommons.org/licenses/by-nc/4.0/

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Econometrics in R

Estimation of Gravity Models

Pachá

satRday Santiago

Dec 15, 2018

Before we begin

Acknowledgements

Contents of the talk

Where to reach me

Lifegoals

Lifegoals

Lifegoals

Lifegoals

Lifegoals

Lifegoals

Questions to the audience

Linear regression

Linear regression

Coding example with mtcars dataset

Coding example with mtcars dataset

Coding example with Galton dataset

Simple gravity model

Simple gravity model

Simple gravity model

Simple gravity model

Trade barriers model

Trade barriers model

Trade barriers model

Trade barriers model

Model estimation

Model estimation

Model estimation

Examples

Double Demeaning

Double Demeaning

Double Demeaning

Double Demeaning

Double Demeaning

Double Demeaning

Double Demeaning

Double Demeaning

Double Demeaning

Double Demeaning

Double Demeaning

Double Demeaning

Double Demeaning

Code and documentation

Questions?

References

This work is licensed as Creative Commons Attribution-NonCommercial 4.0 International To view a copy of this license visit https://creativecommons.org/licenses/by-nc/4.0/

Before we begin

Help

This work is licensed as
Creative Commons Attribution-NonCommercial 4.0 International

To view a copy of this license visit https://creativecommons.org/licenses/by-nc/4.0/