Processing math: 0%
+ - 0:00:00
Notes for current slide
Notes for next slide

Econometrics in R

Estimation of Gravity Models

Pachá

satRday Santiago

Dec 15, 2018

1 / 46

Before we begin

2 / 46

Acknowledgements

Thanks to:

  • Anna-Lenna Woelver and Jan Pablo Burgard fron Trier University (gravity 1st version)
  • Joshua Kunst from PUC Chile (gravity re-writting using rlang)
  • Vilma Romero (made this cool template available)
3 / 46

Contents of the talk

  • Lifegoals (or how to use R well)
  • A bit of statistical/economic theory
  • Our new R package gravity
4 / 46

Where to reach me

Twitter and Github: pachamaltese

Email: m vargas at dcc dot uchile dot cl

Phone: +1 XXX XXX XX XX or +56 X X XXX XX XX

5 / 46

Lifegoals

No matter if you are an expert or beginner user:

  • Don't use setwd()
  • Writing setwd("~/github/satrday-talk/file.csv) goes against reproducibility
  • Use RStudio projects instead
6 / 46

Lifegoals

No matter if you are an expert or beginner user:

  • Don't use rm(list = ls())
  • Restart the session instead
7 / 46

Lifegoals

If Jenny Bryan discovers you doing all of the previous, she will come into your office and set your computer on fire

8 / 46

Lifegoals

  • Please use Git (i.e. Github, Gitlab, Bitbucket, etc)
  • Commit to save your progress
  • If you screw up your code you can always go back
9 / 46

Lifegoals

  • If you work with others, using packrat is desirable
  • But it is really important that you write well (descriptive names + stylesheets)
# Don't do this
x=read.csv("data_2_use_this.csv");x=x[,-1];a=x$x2
# Do this
latam <- read_csv("latam_region_trade_data.csv") %>% select(-country_name)
iso_codes <- select(latam, country_iso)

ifyoudontwritewellitshardtobeunderstood

10 / 46

Lifegoals

Finally, finish your sandwich or Solenya will come with a laser beam

11 / 46

Questions to the audience

Do you understand (a bit) linear algebra?

Have you ever fitted a regression (in R) before today?

12 / 46

Linear regression

\renewcommand{\vec}[1]{\boldsymbol{#1}} \newcommand{\R}{\mathbb{R}}

Let \vec{y} \in \R^n be the outcome and X \in \R^{n\times p} be the design matrix in the context of a general model with intercept: \vec{y} = X\vec{\beta} + \vec{e}

Being:

\begin{equation*} \underset{n\times 1}{\vec{y}} = \begin{pmatrix}y_0 \cr y_1 \cr \vdots \cr y_n\end{pmatrix} \text{ and } \underset{n\times p}{X} = \begin{pmatrix}1 & x_{11} & & x_{1p} \cr 1 & x_{21} & & x_{2p} \cr & \ddots & \cr 1 & x_{n1} & & x_{np}\end{pmatrix} = (\vec{1} \: \vec{x}_1 \: \ldots \: \vec{x}_p) \end{equation*}

13 / 46

Linear regression

In linear models the aim is to minimize the error term by chosing \hat{\vec{\beta}}. One possibility is to minimize the squared error by solving this optimization problem:

\begin{equation} \label{min} \displaystyle \min_{\vec{\beta}} S = \|\vec{y} - X\vec{\beta}\|^2 \end{equation}

Books such as Baltagi (2011) discuss how to solve this, and different equivalent approaches result in this optimal estimator:

\begin{equation} \label{beta} \hat{\vec{\beta}} = (X^tX)^{-1} X^t\vec{y} \end{equation}

With one independent variable and intercept, this is y_i = \beta_0 + \beta_1 x_{i1} + e_i, \vec{\beta} can be written as:

\begin{equation} \label{beta2} \hat{\beta}_1 = cor(\vec{y},\vec{x}) \cdot \frac{sd(\vec{y})}{sd(\vec{x})} \text{ and } \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{\vec{x}} \end{equation}

14 / 46

Coding example with mtcars dataset

Consider the model: mpg_i = \beta_1 wt_i + \beta_2 cyl_i + e_i

This is how to write that model in R notation:

lm(mpg ~ wt + cyl, data = mtcars)

Or written in matrix form:

y <- mtcars$mpg
x0 <- rep(1, length(y))
x1 <- mtcars$wt
x2 <- mtcars$cyl
X <- cbind(x0,x1,x2)
15 / 46

Coding example with mtcars dataset

It's the same to use lm or to perform a matrix multiplication:

fit <- lm(y ~ x1 + x2)
coefficients(fit)
## (Intercept) x1 x2
## 39.686261 -3.190972 -1.507795
# versus
beta <- solve(t(X)%*%X) %*% (t(X)%*%y)
beta
## [,1]
## x0 39.686261
## x1 -3.190972
## x2 -1.507795
16 / 46

Coding example with Galton dataset

Now let's check the correlation procedure:

if (!require(pacman)) install.packages("pacman")
pacman::p_load(HistData)
y <- Galton$child
x <- Galton$parent
beta1 <- cor(y, x) * sd(y) / sd(x)
beta0 <- mean(y) - beta1 * mean(x)
c(beta0, beta1)
## [1] 23.9415302 0.6462906

17 / 46

Simple gravity model

18 / 46

Simple gravity model

The main reference for this section is Woelwer, Burgard, Kunst, and Vargas (2018) and the references therein.

Gravity models in their traditional form are inspired by Newton law of gravitation:

\begin{equation*} F_{ij}=G\frac{M_{i}M_{j}}{D^{2}_{ij}}. \end{equation*}

The force F between two bodies i and j with i \neq j is proportional to the masses M of these bodies and inversely proportional to the square of their geographical distance D. G is a constant and as such of no major concern.

19 / 46

Simple gravity model

The underlying idea of a traditional gravity model, shown for international trade, is equally simple:

\begin{equation*} X_{ij}=G\frac{Y_{i}^{\beta_{1}}Y_{j}^{\beta_{2}}}{D_{ij}^{\beta_{3}}}. \end{equation*}

The trade flow X is explained by Y_{i} and Y_{j} that are the masses of the exporting and importing country (e.g. the GDP) and D_{ij} that is the distance between the countries.

This is also used to study urban policies and migration flows!

20 / 46

Simple gravity model

Dummy variables such as common borders contig or regional trade agreements rta can be added to the model. Let t_{ij} be the transaction cost defined as:

\begin{equation*} t_{ij}= D_{ij} \exp(contig_{ij} + rta_{ij}) \end{equation*}

So that the model with friction becomes:

\begin{equation*} X_{ij}=G\frac{Y_{i}^{\beta_{1}}Y_{j}^{\beta_{2}}}{t_{ij}^{\beta_{3}}}. \end{equation*}

A logarithmic operator can be applied to form a log-linear model and use a standard estimation methods such as OLS:

\begin{equation*} \log X_{ij}=\beta_{0}\log G +\beta_{1}\log Y_{i}+\beta_{2}\log Y_{j}+\beta_{3}\log D_{ij}+\beta_{4}contig_{ij}+\beta_{5}rta_{ij} \end{equation*}

21 / 46

Trade barriers model

22 / 46

Trade barriers model

  • Basically the model proposes that the exports X_{ij} from i to j are determined by the supply factors in i, Y_{i}, and the demand factors in j, Y_{j}, as well as the transaction costs t_{ij}.

  • Next to information on bilateral partners i and j, information on the rest of the world is included in the gravity model with Y=\sum_{i} Y_{i}= \sum_{j} Y_{j} that represents the worldwide sum of incomes (e.g. the world's GDP).

23 / 46

Trade barriers model

  • A key assumption is to take a fixed value \sigma > 1 in order to account for the preference for a variation of goods (e.g. in this model goods can be replaced for other similar goods).

  • The Multilateral Resistance terms are included via the terms P, Inward Multilateral Resistance, and \Pi, Outward Multilateral Resistance.

  • The Inward Multilateral Resistance P_i is a function of the transaction costs of i to all trade partners j.

  • The Outward Multilateral Resistance \Pi_{j} is a function of the transaction costs of j to all trade partners i and their demand.

  • The Multilateral Resistance terms dependent on each other. Hence, the estimation of structural gravity models becomes complex.

24 / 46

Trade barriers model

The econometric literature proposes the Multilateral Resistance model defined by the equations:

\begin{equation*} X_{ij}=\frac{Y_{i}Y_{j}}{Y}\frac{t_{ij}^{1-\sigma}}{P_{j}^{1-\sigma}\Pi_{i}^{1-\sigma}} \end{equation*} with \begin{equation*} P_{i}^{1-\sigma}=\sum_{j}\frac{t_{ij}^{1-\sigma}}{\Pi_{j}^{1-\sigma}}\frac{Y_{j}}{Y};\:\Pi_{j}^{1-\sigma}=\sum_{i}\frac{t_{ij}^{1-\sigma}}{P_{i}^{1-\sigma}}\frac{Y_{i}}{Y} \end{equation*}

25 / 46

Model estimation

26 / 46

Model estimation

  • To estimate gravity equations you need a square dataset including bilateral flows defined by the argument dependent_variable, a distance measure defined by the argument distance that is the key regressor, and other potential influences (e.g. contiguity and common currency) given as a vector in additional_regressors are required.

  • Some estimation methods require ISO codes or similar of type character variables to compute particular country effects. Make sure the origin and destination codes are of type "character".

27 / 46

Model estimation

  • The rule of thumb for regressors or independent variables consists in:

    • All dummy variables should be of type numeric (0/1).
    • If an independent variable is defined as a ratio, it should be logged.
  • The user should perform some data cleaning beforehand to remove observations that contain entries that can distort estimates, notwithstanding the functions provided within gravity package will remove zero flows and distances.

28 / 46

Examples

29 / 46

Double Demeaning

  • Double Demeaning subtracts importer and exporter averages on the left and right hand side of the respective gravity equation, and all unilateral influences including the Multilateral Resistance terms vanish.

  • Therefore, no unilateral variables may be added as independent variables for the estimation.

30 / 46

Double Demeaning

Our ddm function first logs the dependent variable and the distance variable.

Afterwards, the dependent and independent variables are transformed in the following way (exemplary shown for trade flows, X_{ij}): \begin{equation*} (\log X_{ij})_{\text{DDM}} = (\log X_{ij}) - (\log X_{ij})_{\text{Origin Mean}} - (\log X_{ij})_{\text{Destination Mean}} + (\log X_{ij})_{\text{Mean}}. \end{equation*}

One subtracts the mean value for the origin country and the mean value for the destination country and adds the overall mean value to the logged trade flows.

This procedure is repeated for all dependent and independent variables. The transformed variables are then used for the estimation.

31 / 46

Double Demeaning

An example of how to apply the function ddm to an example dataset in gravity and the resulting output is shown in the following:

pacman::p_load(gravity)
fit <- ddm(
dependent_variable = "flow",
distance = "distw",
additional_regressors = c("rta", "comcur", "contig"),
code_origin = "iso_o",
code_destination = "iso_d",
data = gravity_no_zeros
)
32 / 46

Double Demeaning

The package returns lm or glm objects instead of summaries. Doing that allows to use our functions in conjunction with broom or other packages, for example:

pacman::p_load(dplyr, broom)
tidy(fit)
## # A tibble: 4 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 dist_log_ddm -1.60 0.0331 -48.4 0.
## 2 rta_ddm 0.797 0.0700 11.4 6.54e-30
## 3 comcur_ddm 0.174 0.146 1.19 2.34e- 1
## 4 contig_ddm 1.00 0.120 8.36 6.62e-17
33 / 46

Double Demeaning

glance(fit) %>% select(matches("squared"))
## # A tibble: 1 x 2
## r.squared adj.r.squared
## <dbl> <dbl>
## 1 0.254 0.254
34 / 46

Double Demeaning

How to do it without the function?

d <- gravity_no_zeros %>%
mutate(
dist_log = log(distw), # Transforming data, logging distances
y_log = log(flow) # Transforming data, logging flows
)
35 / 46

Double Demeaning

# Substracting the means
d <- d %>%
mutate(
y_log_ddm = y_log,
dist_log_ddm = dist_log
) %>%
group_by(iso_o, add = FALSE) %>%
mutate(
ym1 = mean(y_log_ddm, na.rm = TRUE),
dm1 = mean(dist_log_ddm, na.rm = TRUE)
) %>%
group_by(iso_d, add = FALSE) %>%
mutate(
ym2 = mean(y_log_ddm, na.rm = TRUE),
dm2 = mean(dist_log_ddm, na.rm = TRUE)
)
36 / 46

Double Demeaning

# Substracting the means
d <- d %>%
group_by(iso_o, add = FALSE) %>%
mutate(
y_log_ddm = y_log_ddm - ym1,
dist_log_ddm = dist_log_ddm - dm1
) %>%
group_by(iso_d, add = FALSE) %>%
mutate(
y_log_ddm = y_log_ddm - ym2,
dist_log_ddm = dist_log_ddm - dm2
)
37 / 46

Double Demeaning

# Substracting the means
d <- d %>%
ungroup() %>%
mutate(
y_log_ddm = y_log_ddm + mean(y_log, na.rm = TRUE),
dist_log_ddm = dist_log_ddm + mean(dist_log, na.rm = TRUE)
)
38 / 46

Double Demeaning

# Substracting the means for the other independent variables
pacman::p_load(tidyr)
d2 <- d %>%
select(iso_o, iso_d, rta, comcur, contig) %>%
gather(key, value, -iso_o, -iso_d) %>%
mutate(key = paste0(key, "_ddm")) %>%
group_by(iso_o, key, add = FALSE) %>%
mutate(ddm = value - mean(value, na.rm = TRUE)) %>%
group_by(iso_d, key, add = FALSE) %>%
mutate(ddm = ddm - mean(value, na.rm = TRUE)) %>%
ungroup() %>%
mutate(value = ddm + mean(value, na.rm = TRUE)) %>%
select(iso_o, iso_d, key, value) %>%
spread(key, value)
39 / 46

Double Demeaning

# Model
d <- left_join(d, d2, by = c("iso_o", "iso_d")) %>%
select(y_log_ddm, ends_with("_ddm"))
vars <- paste(c("dist_log_ddm", paste0(c("rta", "comcur", "contig"), "_ddm"), 0), collapse = " + ")
form <- stats::as.formula(paste("y_log_ddm", "~", vars))
model_ddm <- stats::lm(form, data = d)
tidy(model_ddm)
## # A tibble: 4 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 dist_log_ddm -1.60 0.0331 -48.4 0.
## 2 rta_ddm 0.797 0.0700 11.4 6.54e-30
## 3 comcur_ddm 0.174 0.146 1.19 2.34e- 1
## 4 contig_ddm 1.00 0.120 8.36 6.62e-17
40 / 46

Double Demeaning

Inside ddm() the argument robust=TRUE is equivalent to replace lm by rlm in the last part:

model_ddm <- MASS::rlm(form, data = d)
tidy(model_ddm)
## # A tibble: 4 x 4
## term estimate std.error statistic
## <chr> <dbl> <dbl> <dbl>
## 1 dist_log_ddm -1.60 0.0311 -51.7
## 2 rta_ddm 0.656 0.0657 9.99
## 3 comcur_ddm 0.507 0.137 3.70
## 4 contig_ddm 1.12 0.112 9.93
41 / 46

Double Demeaning

fit2 <- ddm(
dependent_variable = "flow",
distance = "distw",
additional_regressors = c("rta", "comcur", "contig"),
code_origin = "iso_o",
code_destination = "iso_d",
data = gravity_no_zeros,
robust = TRUE
)
tidy(fit2)
## # A tibble: 4 x 4
## term estimate std.error statistic
## <chr> <dbl> <dbl> <dbl>
## 1 dist_log_ddm -1.60 0.0311 -51.7
## 2 rta_ddm 0.656 0.0657 9.99
## 3 comcur_ddm 0.507 0.137 3.70
## 4 contig_ddm 1.12 0.112 9.93
42 / 46

Code and documentation

github.com/pachamaltese/gravity

pacha.hk/gravity

43 / 46

Questions?

44 / 46

References

Baltagi, B. H. (2011). Econometrics. Springer Texts in Business and Economics 978-3-642-20059-5. Springer. DOI: 10.1007/978-1-4614-3169-5.

Woelwer, A. L, J. P. Burgard, J. Kunst, et al. (2018). "Gravity: Estimation Methods for Gravity Models in R". In: Journal of Open Source Software 31.3, p. 1038. DOI: 10.21105/joss.01038.

45 / 46

This work is licensed as

Creative Commons Attribution-NonCommercial 4.0 International

To view a copy of this license visit https://creativecommons.org/licenses/by-nc/4.0/

46 / 46

Before we begin

2 / 46
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow