# dataset
library(tradepolicy)
# data transformation
library(dplyr)
library(tidyr)
# regression
library(capybara)
# delta method
library(msm)2 Partial equilibrium trade policy analysis with structural gravity
2.1 Traditional gravity estimates
2.1.1 Preparing the data
If the reader has never used R before, please check chapters 1 to 25 from Wickham and Grolemund (2016).
If the reader has only fitted a few regressions in R, without much practice on transforming and cleaning data before, please check chapters 5 and 18 from Wickham and Grolemund (2016).
Please see the note from page 42 in Yotov et al. (2016). It is a really important note, which tells us that we need to:
- Filter observations for a range of years (1986, 1990, 1994, 1998, 2002 and 2006)
- Transform some variables to logarithm scale (trade and dist) and create new variables from those in the original dataset
- Remove cases where both the exporter and the importer are the same
- Drop observations where the trade flow is zero
Unlike Yotov et al. (2016), here we shall use a single dataset for all the applications and subset its columns depending on what we need. This decision kept the tradepolicy R package as light as possible.
Before conducting any data filtering or regression, we need to load the required packages.
Step 1, including subsetting columns for this application, is straightforward.
ch1_application1 <- agtpa_applications |>
select(exporter, importer, pair_id, year, trade, dist, cntg, lang, clny) |>
filter(year %in% seq(1986, 2006, 4))For step 2, this can be divided in parts, starting with the log transformation of trade and distance.
ch1_application1 <- ch1_application1 |>
mutate(
log_trade = log(trade),
log_dist = log(dist)
)Continuing step 2, we can now create the variables \(Y_{i,t}\) and \(E_{i,t}\) that appear on the OLS model equation in the book.
ch1_application1 <- ch1_application1 |>
# Create Yit
group_by(exporter, year) |>
mutate(
y = sum(trade),
log_y = log(y)
) |>
# Create Eit
group_by(importer, year) |>
mutate(
e = sum(trade),
log_e = log(e)
)The OLS model with remoteness index needs both exporter and importer index, which grouping variables can create. We divide it into sub-steps: Replicate the computation of total exports, then the remoteness index for exporters, and finally the total imports with the corresponding remoteness index for importers.
ch1_application1 <- ch1_application1 |>
# Replicate total_e
group_by(exporter, year) |>
mutate(total_e = sum(e)) |>
group_by(year) |>
mutate(total_e = max(total_e)) |>
# Replicate rem_exp
group_by(exporter, year) |>
mutate(
remoteness_exp = sum(dist * total_e / e),
log_remoteness_exp = log(remoteness_exp)
) |>
# Replicate total_y
group_by(importer, year) |>
mutate(total_y = sum(y)) |>
group_by(year) |>
mutate(total_y = max(total_y)) |>
# Replicate rem_imp
group_by(importer, year) |>
mutate(
remoteness_imp = sum(dist / (y / total_y)),
log_remoteness_imp = log(remoteness_imp)
) |>
ungroup()To create the variables for the OLS with Fixed Effects Model, we followed box #1 on page 44 from Yotov et al. (2016). We combine both exporter and importer variables with the year to create the fixed effects variables.
ch1_application1 <- ch1_application1 |>
# This merges the columns exporter/importer with year
mutate(
exp_year = paste0(exporter, year),
imp_year = paste0(importer, year)
)The addition of exporter/importer time fixed effects concludes step 2, and now we need to perform step 3. This step will be commented for now as I found an error in the reported \(R^2\) for the PPML model that I show next.
# ch1_application1 <- ch1_application1 |>
# filter(exporter != importer)Some cases require conducting step 4, and we will be explicit about it when needed.
2.1.2 OLS estimation ignoring multilateral resistance terms
The general equation for this model is \[ \begin{align} \log X_{ij,t} =& \:\beta_0 + \beta_1 DIST_{i,j} + \beta_2 CNTG_{i,j} + \beta_3 LANG_{i,j} + \beta_4 CLNY_{i,j} + \beta_5 \log Y_{i,t} +\\ \text{ }& \:\beta_6 \log E_{j,t} + \varepsilon_{ij,t}. \end{align} \]
Please see page 41 in Yotov et al. (2016) for full detail of each variable.
The model for this case is straightforward, and in this case, we need to apply step 4 from the previous section to drop cases where the trade is zero.
fit_ols <- felm(
log_trade ~ log_dist + cntg + lang + clny + log_y + log_e,
data = ch1_application1 |>
filter(trade > 0) |>
filter(exporter != importer) # commented out step 3
)
fit_olsFormula: log_trade ~ log_dist + cntg + lang + clny + log_y + log_e
Estimates:
| | Estimate | Std. Error | z value | Pr(>|z|) |
|-------------|----------|------------|----------|-----------|
| (Intercept) | -11.2831 | 0.1517 | -74.3617 | 0.0000 ** |
| log_dist | -1.0016 | 0.0142 | -70.7409 | 0.0000 ** |
| cntg | 0.5738 | 0.0744 | 7.7096 | 0.0000 ** |
| lang | 0.8015 | 0.0337 | 23.7511 | 0.0000 ** |
| clny | 0.7349 | 0.0704 | 10.4403 | 0.0000 ** |
| log_y | 1.1902 | 0.0054 | 220.3201 | 0.0000 ** |
| log_e | 0.9076 | 0.0056 | 162.7269 | 0.0000 ** |
Significance codes: ** p < 0.01; * p < 0.05; + p < 0.10
R-squared : 0.7585
Adj. R-squared: 0.7585
Number of observations: Full 25689; Missing 0; Perfect classification 0
Note that the \(R^2\) and adjusted \(R^2\) look the same because of rounding.
fit_ols$r_squared[1] 0.7585251
fit_ols$adj_r_squared[1] 0.7584687
The employed function, felm(), does not carry a copy of its training data by default besides providing faster fitting for models with fixed effects. This is not the case in base R, where glm() outputs include this data, increasing the model’s size, but this does not affect the model’s predictions and can be changed as the user needs it (Zumel 2014).
The model is almost ready. We only need to stick to the methodology from Yotov et al. (2016) and cluster the standard errors by country pair (see the note on page 42, it is imperative).
The capybara package processes formulas as:
response ~ slopes | fixed_effects | clusterThe model with clustered standard errors is as follows.
fit_ols <- felm(
log_trade ~ log_dist + cntg + lang + clny + log_y + log_e | 0 | pair_id,
data = ch1_application1 |>
filter(trade > 0) |>
filter(exporter != importer) # commented out step 3
)
fit_olsFormula: log_trade ~ log_dist + cntg + lang + clny + log_y + log_e | 0 |
pair_id
Estimates:
| | Estimate | Std. Error | z value | Pr(>|z|) |
|-------------|----------|------------|----------|-----------|
| (Intercept) | -11.2831 | 0.2958 | -38.1452 | 0.0000 ** |
| log_dist | -1.0016 | 0.0273 | -36.6395 | 0.0000 ** |
| cntg | 0.5738 | 0.1847 | 3.1069 | 0.0019 ** |
| lang | 0.8015 | 0.0821 | 9.7640 | 0.0000 ** |
| clny | 0.7349 | 0.1442 | 5.0969 | 0.0000 ** |
| log_y | 1.1902 | 0.0095 | 125.8863 | 0.0000 ** |
| log_e | 0.9076 | 0.0099 | 91.5953 | 0.0000 ** |
Significance codes: ** p < 0.01; * p < 0.05; + p < 0.10
R-squared : 0.7585
Adj. R-squared: 0.7585
Number of observations: Full 25689; Missing 0; Perfect classification 0
To conduct the RESET test we need to add the squared fitted values to the training data.
ch1_application1_ols <- ch1_application1 |>
filter(
trade > 0,
exporter != importer
)
ch1_application1_ols <- ch1_application1_ols |>
mutate(trade_hat_sq = (predict(fit_ols, newdata = ch1_application1_ols))^2)
reset_ols <- felm(
log_trade ~ log_dist + cntg + lang + clny + log_y + log_e + trade_hat_sq | 0 | pair_id,
data = ch1_application1_ols
)
round(reset_ols$coef_table["trade_hat_sq", "Pr(>|z|)"], 4)[1] 0
2.1.3 OLS estimation controlling for multilateral resistance terms with remote indexes
The remoteness model adds variables to the OLS model. The general equation for this model is \[ \begin{align} \log X_{ij,t} =& \:\beta_0 + \beta_1 DIST_{i,j} + \beta_2 CNTG_{i,j} + \beta_3 LANG_{i,j} + \beta_4 CLNY_{i,j} + \beta_5 \log Y_{i,t} +\\ \text{ }& \beta_6 \log E_{j,t} + \beta_7 \log(REM\_EXP_i,t) + \beta_8 \log(REM\_IMP_i,t) + \varepsilon_{ij,t}. \end{align} \]
In the equation above \(REM\_EXP\) and \(REM\_IMP\) are defined as \[ \begin{align} \log(REM\_EXP_{i,t}) &= \log \left( \sum_j \frac{DIST_{i,j}}{E_{j,t} / Y_t} \right) \text{ and }\\ \log(REM\_IMP_{j,t}) &= \log \left( \sum_i \frac{DIST_{i,j}}{Y_{i,t} / Y_t} \right). \end{align} \]
Please see page 43 in Yotov et al. (2016) for full detail of each variable.
Our approach follows box #1 on page 43 from Yotov et al. (2016). Fitting the regression is straightforward. It is just about adding more regressors to what we did in the last section, and we can create a list with a summary for the model.
fit_ols_remoteness <- felm(
log_trade ~ log_dist + cntg + lang + clny + log_y + log_e +
log_remoteness_exp + log_remoteness_imp | 0 | pair_id,
data = ch1_application1 |>
filter(trade > 0) |>
filter(exporter != importer) # commented out step 3
)
fit_ols_remotenessFormula: log_trade ~ log_dist + cntg + lang + clny + log_y + log_e + log_remoteness_exp +
log_remoteness_imp | 0 | pair_id
Estimates:
| | Estimate | Std. Error | z value | Pr(>|z|) |
|--------------------|----------|------------|----------|-----------|
| (Intercept) | -35.2185 | 1.9859 | -17.7341 | 0.0000 ** |
| log_dist | -1.1848 | 0.0313 | -37.8977 | 0.0000 ** |
| cntg | 0.2466 | 0.1769 | 1.3939 | 0.1633 |
| lang | 0.7394 | 0.0784 | 9.4304 | 0.0000 ** |
| clny | 0.8425 | 0.1502 | 5.6081 | 0.0000 ** |
| log_y | 1.1643 | 0.0095 | 122.8377 | 0.0000 ** |
| log_e | 0.9026 | 0.0099 | 91.1162 | 0.0000 ** |
| log_remoteness_exp | 0.9718 | 0.0682 | 14.2528 | 0.0000 ** |
| log_remoteness_imp | 0.2737 | 0.0598 | 4.5789 | 0.0000 ** |
Significance codes: ** p < 0.01; * p < 0.05; + p < 0.10
R-squared : 0.765
Adj. R-squared: 0.765
Number of observations: Full 25689; Missing 0; Perfect classification 0
For the RESET test, we need to add the squared fitted values to the training data as before.
ch1_application1_ols_remoteness <- ch1_application1 |>
ungroup() |>
filter(
trade > 0,
exporter != importer
)
ch1_application1_ols_remoteness <- ch1_application1_ols_remoteness |>
mutate(trade_hat_sq = (predict(fit_ols_remoteness, newdata = ch1_application1_ols_remoteness))^2)
reset_ols_remoteness <- felm(
log_trade ~ log_dist + cntg + lang + clny + log_y + log_e +
log_remoteness_exp + log_remoteness_imp + trade_hat_sq | 0 | pair_id,
data = ch1_application1_ols_remoteness
)
round(reset_ols_remoteness$coef_table["trade_hat_sq", "Pr(>|z|)"], 4)[1] 2e-04
2.1.4 OLS estimation controlling for multilateral resistance terms with fixed effects
The general equation for this model is \[ \begin{align} \log X_{ij,t} =& \: \beta_1 \log(DIST)_{i,j} + \beta_2 CNTG_{i,j} + \beta_3 LANG_{i,j} +\\ \text{ }& \:\beta_4 CLNY_{i,j} + \pi_{i,t} + \chi_{i,t} + \varepsilon_{ij,t}. \end{align} \]
Where the added terms, concerning the OLS model, are \(\pi_{i,t}\) and \(\chi_{i,t}\) that account for exporter-time and importer-time fixed effects, respectively. See page 44 in Yotov et al. (2016) for full detail of each variable.
We can quickly generate a list as we did with the previous models. The only difference to the previous models is that in this case that the variables to the right of the “|” operator are the fixed effects, which are treated differently by the fixest package, which is used internally by the tradepolicy package, for faster model fitting.
Please notice that the summaries intentionally do not show fixed effects, because there are cases where we have thousands of fixed effects.
# tp_summary_app_1(
# formula = log_trade ~ log_dist + cntg + lang + clny | exp_year + imp_year,
# data = filter(ch1_application1, trade > 0),
# method = "ols"
# )
fit_fe <- felm(
log_trade ~ log_dist + cntg + lang + clny | exp_year + imp_year | pair_id,
data = ch1_application1 |>
filter(trade > 0) |>
filter(exporter != importer) # commented out step 3
)
fit_feFormula: log_trade ~ log_dist + cntg + lang + clny | exp_year + imp_year |
pair_id
Estimates:
| | Estimate | Std. Error | z value | Pr(>|z|) |
|----------|----------|------------|----------|-----------|
| log_dist | -1.2156 | 0.0376 | -32.3685 | 0.0000 ** |
| cntg | 0.2232 | 0.1995 | 1.1187 | 0.2633 |
| lang | 0.6609 | 0.0807 | 8.1860 | 0.0000 ** |
| clny | 0.6705 | 0.1470 | 4.5614 | 0.0000 ** |
Significance codes: ** p < 0.01; * p < 0.05; + p < 0.10
R-squared : 0.8432
Adj. R-squared: 0.838
Number of observations: Full 25689; Missing 0; Perfect classification 0
Note that the felm() function does not print the fixed effects. You can access them with fit_fe$fixed_effects and fit_fe$fe_levels as needed.
For the RESET test, we need to add the squared fitted values to the training data as before.
ch1_application1_fe <- ch1_application1 |>
ungroup() |>
filter(
trade > 0,
exporter != importer
)
ch1_application1_fe <- ch1_application1_fe |>
mutate(trade_hat_sq = (predict(fit_fe, newdata = ch1_application1_fe))^2)
reset_fe <- felm(
log_trade ~ log_dist + cntg + lang + clny + trade_hat_sq | exp_year + imp_year | pair_id,
data = ch1_application1_fe
)
round(reset_fe$coef_table["trade_hat_sq", "Pr(>|z|)"], 4)[1] 0
2.1.5 PPML estimation controlling for multilateral resistance terms with fixed effects
The general equation for this model is \[ \begin{align} X_{ij,t} =& \:\exp\left[\beta_1 \log(DIST)_{i,j} + \beta_2 CNTG_{i,j} +\right.\\ \text{ }& \:\left.\beta_3 LANG_{i,j} + \beta_4 CLNY_{i,j} + \pi_{i,t} + \chi_{i,t}\right] \times \varepsilon_{ij,t}. \end{align} \]
The reason to compute this model, despite the lower speed compared to OLS, is that PPML is the only estimator perfectly consistent with the theoretical gravity model. By estimating with PPML, the fixed effects correspond precisely to the corresponding theoretical terms.
The data for this model is the same as for the fixed effects model, and one option in R is to use the fepois() function.
fit_ppml <- fepoisson(trade ~ log_dist + cntg + lang + clny | exp_year + imp_year | pair_id,
data = ch1_application1 |>
filter(exporter != importer) # commented out step 3
)
fit_ppmlFormula: trade ~ log_dist + cntg + lang + clny | exp_year + imp_year |
pair_id
Family: Poisson
Estimates:
| | Estimate | Std. Error | z value | Pr(>|z|) |
|----------|----------|------------|----------|-----------|
| log_dist | -0.8409 | 0.0317 | -26.5633 | 0.0000 ** |
| cntg | 0.4374 | 0.0832 | 5.2603 | 0.0000 ** |
| lang | 0.2475 | 0.0765 | 3.2333 | 0.0012 ** |
| clny | -0.2225 | 0.1162 | -1.9140 | 0.0556 + |
Significance codes: ** p < 0.01; * p < 0.05; + p < 0.10
Pseudo R-squared: 0.9461
Number of observations: Full 28152; Missing 0; Perfect classification 0
Number of Fisher Scoring iterations: 12
Up to this part, everything matches table 1 in Yotov et al. (2016), except for the \(R^2\) of the PPML model, which I could not replicate except by obtaining an out-of-sample \(R^2\) by obtaining the fitted values including domestic flows. I believe there is a mistake in the book regarding this value.
trade <- pull(ch1_application1, trade)
trade_hat <- predict(fit_ppml, newdata = ch1_application1)
cor(trade, trade_hat)^2[1] 0.6143844
For the RESET test, we need to add the squared fitted link values to the training data as before.
ch1_application1_ppml <- ch1_application1 |>
ungroup() |>
filter(exporter != importer)
ch1_application1_ppml <- ch1_application1_ppml |>
mutate(trade_hat_sq = (predict(fit_ppml, type = "link", newdata = ch1_application1_ppml))^2)
reset_ppml <- fepoisson(
trade ~ log_dist + cntg + lang + clny + trade_hat_sq | exp_year + imp_year | pair_id,
data = ch1_application1_ppml
)
round(reset_ppml$coef_table["trade_hat_sq", "Pr(>|z|)"], 4)[1] 0.6416
2.1.6 Presenting all models together
Capybara provides the summary_table() function to present multiple models together. We can use it to show all the models fitted in this section (note it is a general use function that does not include the RESET test p-values).
summary_table(fit_ols, fit_ols_remoteness, fit_fe, fit_ppml)| Variable | (1) | (2) | (3) | (4) |
|----------------------|---------------------|---------------------|--------------------|--------------------|
| (Intercept) | -11.283** | -35.219** | | |
| | (0.296) | (1.986) | | |
| log_dist | -1.002** | -1.185** | -1.216** | -0.841** |
| | (0.027) | (0.031) | (0.038) | (0.032) |
| cntg | 0.574** | 0.247 | 0.223 | 0.437** |
| | (0.185) | (0.177) | (0.199) | (0.083) |
| lang | 0.802** | 0.739** | 0.661** | 0.247** |
| | (0.082) | (0.078) | (0.081) | (0.077) |
| clny | 0.735** | 0.842** | 0.670** | -0.222+ |
| | (0.144) | (0.150) | (0.147) | (0.116) |
| log_y | 1.190** | 1.164** | | |
| | (0.009) | (0.009) | | |
| log_e | 0.908** | 0.903** | | |
| | (0.010) | (0.010) | | |
| log_remoteness_exp | | 0.972** | | |
| | | (0.068) | | |
| log_remoteness_imp | | 0.274** | | |
| | | (0.060) | | |
| | | | | |
| Fixed effects | | | | |
| exp_year | No | No | Yes | Yes |
| imp_year | No | No | Yes | Yes |
| | | | | |
| N | 25,689 | 25,689 | 25,689 | 28,152 |
| Pseudo R-squared | 0.759 | 0.765 | 0.843 | 0.946 |
Standard errors in parenthesis
Significance levels: ** p < 0.01; * p < 0.05; + p < 0.10
The difference with Yotov et al. (2016) is that the 3rd and 4th columns do not include the “grand mean” (intercept) because I included exporter-time and importer-time fixed effects, which remove the need for an intercept as it is absorbed by the fixed effects and the slopes and their standard effect match.
2.2 The “distance puzzle” resolved
2.2.1 Preparing the data
Please see the note from page 47 in Yotov et al. (2016). We need to proceed with similar steps as in the previous section.
The distance puzzle proposes the gravity specification \[ \begin{align} X_{ij,t} =& \:\exp\left[\pi_{i,t} + \chi_{i,t} + \beta_1 \log(DIST)_{i,j} + \beta_2 CNTG_{i,j} + \beta_3 LANG_{i,j}\right]\times\\ \text{ }& \:\exp\left[\beta_4 CLNY_{i,j} + \beta_5 \log(DIST\_INTRA_{i,i})\right] \times \varepsilon_{ij,t}. \end{align} \]
The difference concerning the last section is that now we need to separate the distance variable into multiple columns that account for discrete-time effects. The \(\beta_T\) terms of the equation reflect this. Perhaps the easiest option is to transform the year into a text column and then use the pivot_wider() function.
We need to remove cases where the exporter is the same as the importer and cases where trade is zero for the OLS model. For the PPML models, we need to mark rows where the exporter and the importer are the same, and we need to create the same country column, which is also required to transform the distance variables as shown in box #1 in page 48 from Yotov et al. (2016).
In order to avoid creating two very similar datasets, we shall create one dataset to cover both OLS and PPML.
ch1_application2 <- agtpa_applications |>
select(exporter, importer, pair_id, year, trade, dist, cntg, lang, clny) |>
# this filter covers both OLS and PPML
filter(year %in% seq(1986, 2006, 4)) |>
mutate(
# variables for both OLS and PPML
exp_year = paste0(exporter, year),
imp_year = paste0(importer, year),
year = paste0("log_dist_", year),
log_trade = log(trade),
log_dist = log(dist),
smctry = ifelse(importer != exporter, 0, 1),
# PPML specific variables
log_dist_intra = log_dist * smctry,
intra_pair = ifelse(exporter == importer, exporter, "inter")
) |>
pivot_wider(names_from = year, values_from = log_dist, values_fill = 0) |>
mutate(across(log_dist_1986:log_dist_2006, function(x) x * (1 - smctry)))The across() function is a shortcut to avoid repetition, as in the following example provided for reference without computation.
ch1_application2 |>
mutate(
log_dist_1986 = log_dist_1986 * (1 - smctry),
log_dist_1990 = log_dist_1990 * (1 - smctry),
# repeat log_dist_T many_times for T = 1994, 1998, ...
log_dist_2006 = log_dist_2006 * (1 - smctry)
)Note that the OLS model shall require filtering when we specify the model because we skipped filtering the cases where trade is equal to zero and both the importer and the exporter are the same. Because the solution for the “distance puzzle” implies different transformations and filters for the OLS and PPML cases, one possibility is to filter in the same summary functions.
2.2.2 OLS solution for the “distance puzzle”
The gravity specification, which includes \(\pi_{i,t} + \chi_{i,t}\), means that we need to do something very similar to what we did in the last section.
With the data from above, the model specification is straightforward.
fit_ols_distance_puzzle <- felm(
log_trade ~ log_dist_1986 + log_dist_1990 + log_dist_1994 +
log_dist_1998 + log_dist_2002 + log_dist_2006 + cntg + lang + clny | exp_year +
imp_year | pair_id,
data = filter(ch1_application2, importer != exporter, trade > 0)
)
fit_ols_distance_puzzleFormula: log_trade ~ log_dist_1986 + log_dist_1990 + log_dist_1994 + log_dist_1998 +
log_dist_2002 + log_dist_2006 + cntg + lang + clny | exp_year +
imp_year | pair_id
Estimates:
| | Estimate | Std. Error | z value | Pr(>|z|) |
|---------------|----------|------------|----------|-----------|
| log_dist_1986 | -1.1685 | 0.0429 | -27.2219 | 0.0000 ** |
| log_dist_1990 | -1.1551 | 0.0416 | -27.7495 | 0.0000 ** |
| log_dist_1994 | -1.2106 | 0.0449 | -26.9459 | 0.0000 ** |
| log_dist_1998 | -1.2481 | 0.0421 | -29.6654 | 0.0000 ** |
| log_dist_2002 | -1.2413 | 0.0434 | -28.6114 | 0.0000 ** |
| log_dist_2006 | -1.2613 | 0.0430 | -29.3333 | 0.0000 ** |
| cntg | 0.2230 | 0.1993 | 1.1187 | 0.2633 |
| lang | 0.6611 | 0.0807 | 8.1899 | 0.0000 ** |
| clny | 0.6702 | 0.1469 | 4.5616 | 0.0000 ** |
Significance codes: ** p < 0.01; * p < 0.05; + p < 0.10
R-squared : 0.8433
Adj. R-squared: 0.838
Number of observations: Full 25689; Missing 0; Perfect classification 0
For the inference, we need to use the Delta method on the estimated coefficients.
beta_log_dist <- grep("log_dist", rownames(fit_ols_distance_puzzle$coef_table), value = TRUE)
beta_log_dist <- c(min(beta_log_dist), max(beta_log_dist))
# change = 100 * (beta2 - beta1) / beta1
betas <- fit_ols_distance_puzzle$coef_table[, 1][beta_log_dist]
beta_pct_chg <- as.numeric(100 * (betas[2] - betas[1]) / betas[1])
beta_vcov_cluster <- fit_ols_distance_puzzle$vcov[
which(grepl(paste(beta_log_dist, collapse = "|"), rownames(fit_ols_distance_puzzle$vcov))),
which(grepl(paste(beta_log_dist, collapse = "|"), rownames(fit_ols_distance_puzzle$vcov)))
]
beta_std_err <- deltamethod(
~ 100 * (x2 - x1) / x1,
c(betas[1], betas[2]), beta_vcov_cluster
)
beta_tstat <- beta_pct_chg / beta_std_err
beta_pval <- pnorm(-abs(beta_tstat)) + (1 - pnorm(abs(beta_tstat)))
round(c(
"pct_chg_log_dist" = beta_pct_chg,
"std_err" = beta_std_err,
"p-value" = beta_pval
), 4)pct_chg_log_dist std_err p-value
7.9502 3.6976 0.0316
2.2.3 PPML solution for the “distance puzzle”
This model is very similar to the one specified in the PPML section from the last section. We can directly fit the model.
fit_ppml_distance_puzzle <- fepoisson(
trade ~ log_dist_1986 + log_dist_1990 + log_dist_1994 +
log_dist_1998 + log_dist_2002 + log_dist_2006 + cntg + lang + clny |
exp_year + imp_year | pair_id,
data = filter(ch1_application2, importer != exporter)
)
fit_ppml_distance_puzzleFormula: trade ~ log_dist_1986 + log_dist_1990 + log_dist_1994 + log_dist_1998 +
log_dist_2002 + log_dist_2006 + cntg + lang + clny | exp_year +
imp_year | pair_id
Family: Poisson
Estimates:
| | Estimate | Std. Error | z value | Pr(>|z|) |
|---------------|----------|------------|----------|-----------|
| log_dist_1986 | -0.8593 | 0.0370 | -23.1952 | 0.0000 ** |
| log_dist_1990 | -0.8341 | 0.0377 | -22.1362 | 0.0000 ** |
| log_dist_1994 | -0.8348 | 0.0351 | -23.7915 | 0.0000 ** |
| log_dist_1998 | -0.8469 | 0.0354 | -23.9491 | 0.0000 ** |
| log_dist_2002 | -0.8484 | 0.0316 | -26.8072 | 0.0000 ** |
| log_dist_2006 | -0.8356 | 0.0312 | -26.7420 | 0.0000 ** |
| cntg | 0.4372 | 0.0832 | 5.2574 | 0.0000 ** |
| lang | 0.2475 | 0.0765 | 3.2335 | 0.0012 ** |
| clny | -0.2223 | 0.1163 | -1.9111 | 0.0560 + |
Significance codes: ** p < 0.01; * p < 0.05; + p < 0.10
Pseudo R-squared: 0.9462
Number of observations: Full 28152; Missing 0; Perfect classification 0
Number of Fisher Scoring iterations: 12
For the inference, we need to use the Delta method on the estimated coefficients as we did in the OLS case.
beta_log_dist <- grep("log_dist", rownames(fit_ppml_distance_puzzle$coef_table), value = TRUE)
beta_log_dist <- c(min(beta_log_dist), max(beta_log_dist))
# change = 100 * (beta2 - beta1) / beta1
betas <- fit_ppml_distance_puzzle$coef_table[, 1][beta_log_dist]
beta_pct_chg <- as.numeric(100 * (betas[2] - betas[1]) / betas[1])
beta_vcov_cluster <- fit_ppml_distance_puzzle$vcov[
which(grepl(paste(beta_log_dist, collapse = "|"), rownames(fit_ppml_distance_puzzle$vcov))),
which(grepl(paste(beta_log_dist, collapse = "|"), rownames(fit_ppml_distance_puzzle$vcov)))
]
beta_std_err <- deltamethod(
~ 100 * (x2 - x1) / x1,
c(betas[1], betas[2]), beta_vcov_cluster
)
beta_tstat <- beta_pct_chg / beta_std_err
beta_pval <- pnorm(-abs(beta_tstat)) + (1 - pnorm(abs(beta_tstat)))
round(c(
"pct_chg_log_dist" = beta_pct_chg,
"std_err" = beta_std_err,
"p-value" = beta_pval
), 4)pct_chg_log_dist std_err p-value
-2.7503 3.0039 0.3599
2.2.4 Internal distance solution for the “distance puzzle”
This model requires us to add the internal distance variable to the PPML model and not filter the rows where the exporter and the importer are the same.
fit_ppml_intra_distance_puzzle <- fepoisson(
trade ~ log_dist_1986 + log_dist_1990 + log_dist_1994 +
log_dist_1998 + log_dist_2002 + log_dist_2006 + cntg + lang + clny +
log_dist_intra | exp_year + imp_year | pair_id,
data = ch1_application2
)Separation detected: 2463 observation(s) with perfect prediction were excluded from estimation.
fit_ppml_intra_distance_puzzleFormula: trade ~ log_dist_1986 + log_dist_1990 + log_dist_1994 + log_dist_1998 +
log_dist_2002 + log_dist_2006 + cntg + lang + clny + log_dist_intra |
exp_year + imp_year | pair_id
Family: Poisson
Estimates:
| | Estimate | Std. Error | z value | Pr(>|z|) |
|----------------|----------|------------|----------|-----------|
| log_dist_1986 | -0.9758 | 0.0721 | -13.5373 | 0.0000 ** |
| log_dist_1990 | -0.9383 | 0.0731 | -12.8343 | 0.0000 ** |
| log_dist_1994 | -0.9137 | 0.0720 | -12.6877 | 0.0000 ** |
| log_dist_1998 | -0.8855 | 0.0710 | -12.4675 | 0.0000 ** |
| log_dist_2002 | -0.8826 | 0.0706 | -12.5003 | 0.0000 ** |
| log_dist_2006 | -0.8710 | 0.0713 | -12.2182 | 0.0000 ** |
| cntg | 0.3746 | 0.1396 | 2.6837 | 0.0073 ** |
| lang | 0.3343 | 0.1681 | 1.9885 | 0.0468 * |
| clny | 0.0182 | 0.1563 | 0.1164 | 0.9073 |
| log_dist_intra | -0.4867 | 0.1007 | -4.8334 | 0.0000 ** |
Significance codes: ** p < 0.01; * p < 0.05; + p < 0.10
Pseudo R-squared: 0.9954
Number of observations: Full 28566; Separated 2463; Perfect classification 0
Number of Fisher Scoring iterations: 12
For the inference, we need to use the Delta method on the estimated coefficients as we did in the OLS case (we need the numbers for the regular expression).
beta_log_dist <- grep("log_dist_[0-9]{4}", rownames(fit_ppml_intra_distance_puzzle$coef_table), value = TRUE)
beta_log_dist <- c(min(beta_log_dist), max(beta_log_dist))
# change = 100 * (beta2 - beta1) / beta1
betas <- fit_ppml_intra_distance_puzzle$coef_table[, 1][beta_log_dist]
beta_pct_chg <- as.numeric(100 * (betas[2] - betas[1]) / betas[1])
beta_vcov_cluster <- fit_ppml_intra_distance_puzzle$vcov[
which(grepl(paste(beta_log_dist, collapse = "|"), rownames(fit_ppml_intra_distance_puzzle$vcov))),
which(grepl(paste(beta_log_dist, collapse = "|"), rownames(fit_ppml_intra_distance_puzzle$vcov)))
]
beta_std_err <- deltamethod(
~ 100 * (x2 - x1) / x1,
c(betas[1], betas[2]), beta_vcov_cluster
)
beta_tstat <- beta_pct_chg / beta_std_err
beta_pval <- pnorm(-abs(beta_tstat)) + (1 - pnorm(abs(beta_tstat)))
round(c(
"pct_chg_log_dist" = beta_pct_chg,
"std_err" = beta_std_err,
"p-value" = beta_pval
), 4)pct_chg_log_dist std_err p-value
-10.7390 1.0407 0.0000
2.2.5 Internal distance and home bias solution for the “distance puzzle”
This model requires us to add the same country variable to the internal distance model and repeat the rest of the steps from the last section.
fit_ppml_home_distance_puzzle <- fepoisson(
trade ~ log_dist_1986 + log_dist_1990 + log_dist_1994 +
log_dist_1998 + log_dist_2002 + log_dist_2006 + cntg + lang + clny +
log_dist_intra + smctry | exp_year + imp_year | pair_id,
data = ch1_application2
)Separation detected: 2463 observation(s) with perfect prediction were excluded from estimation.
fit_ppml_home_distance_puzzleFormula: trade ~ log_dist_1986 + log_dist_1990 + log_dist_1994 + log_dist_1998 +
log_dist_2002 + log_dist_2006 + cntg + lang + clny + log_dist_intra +
smctry | exp_year + imp_year | pair_id
Family: Poisson
Estimates:
| | Estimate | Std. Error | z value | Pr(>|z|) |
|----------------|----------|------------|----------|-----------|
| log_dist_1986 | -0.8538 | 0.0626 | -13.6391 | 0.0000 ** |
| log_dist_1990 | -0.8178 | 0.0632 | -12.9369 | 0.0000 ** |
| log_dist_1994 | -0.7951 | 0.0634 | -12.5383 | 0.0000 ** |
| log_dist_1998 | -0.7692 | 0.0629 | -12.2240 | 0.0000 ** |
| log_dist_2002 | -0.7659 | 0.0625 | -12.2453 | 0.0000 ** |
| log_dist_2006 | -0.7537 | 0.0621 | -12.1329 | 0.0000 ** |
| cntg | 0.5753 | 0.1549 | 3.7144 | 0.0002 ** |
| lang | 0.3496 | 0.1371 | 2.5508 | 0.0107 * |
| clny | 0.0258 | 0.1250 | 0.2065 | 0.8364 |
| log_dist_intra | -0.6000 | 0.1093 | -5.4915 | 0.0000 ** |
| smctry | 1.6794 | 0.5738 | 2.9267 | 0.0034 ** |
Significance codes: ** p < 0.01; * p < 0.05; + p < 0.10
Pseudo R-squared: 0.9962
Number of observations: Full 28566; Separated 2463; Perfect classification 0
Number of Fisher Scoring iterations: 12
For the inference, we need to use the Delta method on the estimated coefficients as we did in the OLS case (we need the numbers for the regular expression).
beta_log_dist <- grep("log_dist_[0-9]{4}", rownames(fit_ppml_home_distance_puzzle$coef_table), value = TRUE)
beta_log_dist <- c(min(beta_log_dist), max(beta_log_dist))
# change = 100 * (beta2 - beta1) / beta1
betas <- fit_ppml_home_distance_puzzle$coef_table[, 1][beta_log_dist]
beta_pct_chg <- as.numeric(100 * (betas[2] - betas[1]) / betas[1])
beta_vcov_cluster <- fit_ppml_home_distance_puzzle$vcov[
which(grepl(paste(beta_log_dist, collapse = "|"), rownames(fit_ppml_home_distance_puzzle$vcov))),
which(grepl(paste(beta_log_dist, collapse = "|"), rownames(fit_ppml_home_distance_puzzle$vcov)))
]
beta_std_err <- deltamethod(
~ 100 * (x2 - x1) / x1,
c(betas[1], betas[2]), beta_vcov_cluster
)
beta_tstat <- beta_pct_chg / beta_std_err
beta_pval <- pnorm(-abs(beta_tstat)) + (1 - pnorm(abs(beta_tstat)))
round(c(
"pct_chg_log_dist" = beta_pct_chg,
"std_err" = beta_std_err,
"p-value" = beta_pval
), 4)pct_chg_log_dist std_err p-value
-11.7266 1.1593 0.0000
2.2.6 Fixed effects solution for the “distance puzzle”
This model requires us to remove the internal distance and same country variables from the last model and include the internal pair variable to account for the intra-national fixed effects.
fit_fe_distance_puzzle <- fepoisson(
trade ~ log_dist_1986 + log_dist_1990 + log_dist_1994 +
log_dist_1998 + log_dist_2002 + log_dist_2006 + cntg + lang + clny +
intra_pair | exp_year + imp_year | pair_id,
data = ch1_application2
)Separation detected: 2463 observation(s) with perfect prediction were excluded from estimation.
fit_fe_distance_puzzleFormula: trade ~ log_dist_1986 + log_dist_1990 + log_dist_1994 + log_dist_1998 +
log_dist_2002 + log_dist_2006 + cntg + lang + clny + intra_pair |
exp_year + imp_year | pair_id
Family: Poisson
Estimates:
| | Estimate | Std. Error | z value | Pr(>|z|) |
|-----------------|----------|------------|----------|-----------|
| log_dist_1986 | -0.9097 | 0.0324 | -28.0812 | 0.0000 ** |
| log_dist_1990 | -0.8795 | 0.0322 | -27.2935 | 0.0000 ** |
| log_dist_1994 | -0.8605 | 0.0320 | -26.9097 | 0.0000 ** |
| log_dist_1998 | -0.8336 | 0.0318 | -26.2159 | 0.0000 ** |
| log_dist_2002 | -0.8299 | 0.0322 | -25.8096 | 0.0000 ** |
| log_dist_2006 | -0.8113 | 0.0322 | -25.2261 | 0.0000 ** |
| cntg | 0.4424 | 0.0817 | 5.4134 | 0.0000 ** |
| lang | 0.2395 | 0.0759 | 3.1535 | 0.0016 ** |
| clny | -0.2209 | 0.1168 | -1.8907 | 0.0587 + |
| intra_pairAUS | -1.0421 | 0.5064 | -2.0581 | 0.0396 * |
| intra_pairAUT | 0.0651 | 0.5298 | 0.1229 | 0.9022 |
| intra_pairBEL | 0.4176 | 0.5092 | 0.8202 | 0.4121 |
| intra_pairBGR | 2.4451 | 0.5561 | 4.3969 | 0.0000 ** |
| intra_pairBOL | 2.5958 | 0.5563 | 4.6661 | 0.0000 ** |
| intra_pairBRA | -0.2690 | 0.4173 | -0.6446 | 0.5192 |
| intra_pairCAN | -1.5788 | 0.4862 | -3.2475 | 0.0012 ** |
| intra_pairCHE | 0.9185 | 0.5001 | 1.8366 | 0.0663 + |
| intra_pairCHL | -0.1144 | 0.5609 | -0.2040 | 0.8384 |
| intra_pairCHN | -1.5021 | 0.4809 | -3.1235 | 0.0018 ** |
| intra_pairCMR | 3.0169 | 0.8031 | 3.7564 | 0.0002 ** |
| intra_pairCOL | 1.6914 | 0.5049 | 3.3499 | 0.0008 ** |
| intra_pairCRI | 1.5153 | 0.4892 | 3.0972 | 0.0020 ** |
| intra_pairCYP | 2.8809 | 0.7197 | 4.0031 | 0.0001 ** |
| intra_pairDEU | -1.6548 | 0.4822 | -3.4320 | 0.0006 ** |
| intra_pairDNK | 0.1458 | 0.5431 | 0.2684 | 0.7884 |
| intra_pairECU | 2.1514 | 0.5974 | 3.6010 | 0.0003 ** |
| intra_pairEGY | 1.9733 | 0.5086 | 3.8799 | 0.0001 ** |
| intra_pairESP | -0.7382 | 0.5110 | -1.4448 | 0.1485 |
| intra_pairFIN | 0.1319 | 0.5216 | 0.2528 | 0.8004 |
| intra_pairFRA | -0.9555 | 0.4794 | -1.9932 | 0.0462 * |
| intra_pairGBR | -1.4155 | 0.4793 | -2.9534 | 0.0031 ** |
| intra_pairGRC | 1.0808 | 0.5475 | 1.9741 | 0.0484 * |
| intra_pairHKG | -2.3687 | 0.5020 | -4.7187 | 0.0000 ** |
| intra_pairHUN | 0.5810 | 0.6158 | 0.9435 | 0.3454 |
| intra_pairIDN | -0.2155 | 0.5106 | -0.4220 | 0.6730 |
| intra_pairIND | 0.5261 | 0.5148 | 1.0220 | 0.3068 |
| intra_pairinter | 1.3074 | 0.5634 | 2.3208 | 0.0203 * |
| intra_pairIRL | -0.6181 | 0.4882 | -1.2662 | 0.2055 |
| intra_pairIRN | 2.5082 | 0.6113 | 4.1030 | 0.0000 ** |
| intra_pairISL | 3.1815 | 0.6510 | 4.8873 | 0.0000 ** |
| intra_pairISR | 0.2578 | 0.5597 | 0.4606 | 0.6451 |
| intra_pairITA | -1.0176 | 0.4937 | -2.0613 | 0.0393 * |
| intra_pairJOR | 3.8019 | 0.6144 | 6.1876 | 0.0000 ** |
| intra_pairJPN | -1.7727 | 0.5027 | -3.5262 | 0.0004 ** |
| intra_pairKEN | 3.7344 | 0.6183 | 6.0393 | 0.0000 ** |
| intra_pairKOR | -0.8748 | 0.5230 | -1.6727 | 0.0944 + |
| intra_pairKWT | 1.5908 | 0.5586 | 2.8478 | 0.0044 ** |
| intra_pairLKA | 1.7387 | 0.5486 | 3.1695 | 0.0015 ** |
| intra_pairMAC | 2.8838 | 0.9032 | 3.1929 | 0.0014 ** |
| intra_pairMAR | 1.7495 | 0.7440 | 2.3516 | 0.0187 * |
| intra_pairMEX | -1.5872 | 0.5140 | -3.0877 | 0.0020 ** |
| intra_pairMLT | 1.8203 | 0.5695 | 3.1963 | 0.0014 ** |
| intra_pairMMR | 4.9071 | 0.5127 | 9.5710 | 0.0000 ** |
| intra_pairMUS | 1.0561 | 0.7603 | 1.3891 | 0.1648 |
| intra_pairMWI | 3.5614 | 0.7640 | 4.6618 | 0.0000 ** |
| intra_pairMYS | -1.3170 | 0.5494 | -2.3970 | 0.0165 * |
| intra_pairNER | 2.5385 | 1.1890 | 2.1350 | 0.0328 * |
| intra_pairNGA | 3.5935 | 0.5061 | 7.0997 | 0.0000 ** |
| intra_pairNLD | -0.8907 | 0.5038 | -1.7678 | 0.0771 + |
| intra_pairNOR | 0.5510 | 0.5318 | 1.0361 | 0.3002 |
| intra_pairNPL | 4.0083 | 0.8449 | 4.7440 | 0.0000 ** |
| intra_pairPAN | 1.2533 | 0.9016 | 1.3901 | 0.1645 |
| intra_pairPHL | -0.5619 | 0.5245 | -1.0713 | 0.2840 |
| intra_pairPOL | 1.0596 | 0.5603 | 1.8911 | 0.0586 + |
| intra_pairPRT | 0.1405 | 0.5963 | 0.2355 | 0.8138 |
| intra_pairQAT | 2.0199 | 0.5318 | 3.7986 | 0.0001 ** |
| intra_pairROM | 1.5870 | 0.5776 | 2.7476 | 0.0060 ** |
| intra_pairSEN | 3.7343 | 0.9577 | 3.8992 | 0.0001 ** |
| intra_pairSGP | -2.5063 | 0.4838 | -5.1810 | 0.0000 ** |
| intra_pairSWE | -0.8687 | 0.5035 | -1.7256 | 0.0844 + |
| intra_pairTHA | -0.9992 | 0.5216 | -1.9157 | 0.0554 + |
| intra_pairTTO | 2.4403 | 0.5225 | 4.6708 | 0.0000 ** |
| intra_pairTUN | 2.0154 | 0.7403 | 2.7225 | 0.0065 ** |
| intra_pairTUR | 0.4985 | 0.5325 | 0.9360 | 0.3493 |
| intra_pairTZA | 2.1754 | 0.7069 | 3.0773 | 0.0021 ** |
| intra_pairURY | 1.6180 | 0.7107 | 2.2768 | 0.0228 * |
| intra_pairUSA | -3.4132 | 0.4801 | -7.1088 | 0.0000 ** |
| intra_pairZAF | -0.3985 | 0.5688 | -0.7006 | 0.4835 |
Significance codes: ** p < 0.01; * p < 0.05; + p < 0.10
Pseudo R-squared: 0.9993
Number of observations: Full 28566; Separated 2463; Perfect classification 0
Number of Fisher Scoring iterations: 13
For the inference, we need to use the Delta method on the estimated coefficients as we did in the OLS case.
beta_log_dist <- grep("log_dist_[0-9]{4}", rownames(fit_fe_distance_puzzle$coef_table), value = TRUE)
beta_log_dist <- c(min(beta_log_dist), max(beta_log_dist))
# change = 100 * (beta2 - beta1) / beta1
betas <- fit_fe_distance_puzzle$coef_table[, 1][beta_log_dist]
beta_pct_chg <- as.numeric(100 * (betas[2] - betas[1]) / betas[1])
beta_vcov_cluster <- fit_fe_distance_puzzle$vcov[
which(grepl(paste(beta_log_dist, collapse = "|"), rownames(fit_fe_distance_puzzle$vcov))),
which(grepl(paste(beta_log_dist, collapse = "|"), rownames(fit_fe_distance_puzzle$vcov)))
]
beta_std_err <- deltamethod(
~ 100 * (x2 - x1) / x1,
c(betas[1], betas[2]), beta_vcov_cluster
)
beta_tstat <- beta_pct_chg / beta_std_err
beta_pval <- pnorm(-abs(beta_tstat)) + (1 - pnorm(abs(beta_tstat)))
round(c(
"pct_chg_log_dist" = beta_pct_chg,
"std_err" = beta_std_err,
"p-value" = beta_pval
), 4)pct_chg_log_dist std_err p-value
-10.8156 0.7695 0.0000
2.2.7 Presenting all models together
As it was done in the previous section, we can use the summary_table() function to show all the models fitted in this section.
summary_table(
fit_ols_distance_puzzle,
fit_ppml_distance_puzzle,
fit_ppml_intra_distance_puzzle,
fit_ppml_home_distance_puzzle,
fit_fe_distance_puzzle
)| Variable | (1) | (2) | (3) | (4) | (5) |
|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|
| log_dist_1986 | -1.168** | -0.859** | -0.976** | -0.854** | -0.910** |
| | (0.043) | (0.037) | (0.072) | (0.063) | (0.032) |
| log_dist_1990 | -1.155** | -0.834** | -0.938** | -0.818** | -0.880** |
| | (0.042) | (0.038) | (0.073) | (0.063) | (0.032) |
| log_dist_1994 | -1.211** | -0.835** | -0.914** | -0.795** | -0.860** |
| | (0.045) | (0.035) | (0.072) | (0.063) | (0.032) |
| log_dist_1998 | -1.248** | -0.847** | -0.885** | -0.769** | -0.834** |
| | (0.042) | (0.035) | (0.071) | (0.063) | (0.032) |
| log_dist_2002 | -1.241** | -0.848** | -0.883** | -0.766** | -0.830** |
| | (0.043) | (0.032) | (0.071) | (0.063) | (0.032) |
| log_dist_2006 | -1.261** | -0.836** | -0.871** | -0.754** | -0.811** |
| | (0.043) | (0.031) | (0.071) | (0.062) | (0.032) |
| cntg | 0.223 | 0.437** | 0.375** | 0.575** | 0.442** |
| | (0.199) | (0.083) | (0.140) | (0.155) | (0.082) |
| lang | 0.661** | 0.248** | 0.334* | 0.350* | 0.239** |
| | (0.081) | (0.077) | (0.168) | (0.137) | (0.076) |
| clny | 0.670** | -0.222+ | 0.018 | 0.026 | -0.221+ |
| | (0.147) | (0.116) | (0.156) | (0.125) | (0.117) |
| log_dist_intra | | | -0.487** | -0.600** | |
| | | | (0.101) | (0.109) | |
| smctry | | | | 1.679** | |
| | | | | (0.574) | |
| intra_pairAUS | | | | | -1.042* |
| | | | | | (0.506) |
| intra_pairAUT | | | | | 0.065 |
| | | | | | (0.530) |
| intra_pairBEL | | | | | 0.418 |
| | | | | | (0.509) |
| intra_pairBGR | | | | | 2.445** |
| | | | | | (0.556) |
| intra_pairBOL | | | | | 2.596** |
| | | | | | (0.556) |
| intra_pairBRA | | | | | -0.269 |
| | | | | | (0.417) |
| intra_pairCAN | | | | | -1.579** |
| | | | | | (0.486) |
| intra_pairCHE | | | | | 0.918+ |
| | | | | | (0.500) |
| intra_pairCHL | | | | | -0.114 |
| | | | | | (0.561) |
| intra_pairCHN | | | | | -1.502** |
| | | | | | (0.481) |
| intra_pairCMR | | | | | 3.017** |
| | | | | | (0.803) |
| intra_pairCOL | | | | | 1.691** |
| | | | | | (0.505) |
| intra_pairCRI | | | | | 1.515** |
| | | | | | (0.489) |
| intra_pairCYP | | | | | 2.881** |
| | | | | | (0.720) |
| intra_pairDEU | | | | | -1.655** |
| | | | | | (0.482) |
| intra_pairDNK | | | | | 0.146 |
| | | | | | (0.543) |
| intra_pairECU | | | | | 2.151** |
| | | | | | (0.597) |
| intra_pairEGY | | | | | 1.973** |
| | | | | | (0.509) |
| intra_pairESP | | | | | -0.738 |
| | | | | | (0.511) |
| intra_pairFIN | | | | | 0.132 |
| | | | | | (0.522) |
| intra_pairFRA | | | | | -0.955* |
| | | | | | (0.479) |
| intra_pairGBR | | | | | -1.415** |
| | | | | | (0.479) |
| intra_pairGRC | | | | | 1.081* |
| | | | | | (0.547) |
| intra_pairHKG | | | | | -2.369** |
| | | | | | (0.502) |
| intra_pairHUN | | | | | 0.581 |
| | | | | | (0.616) |
| intra_pairIDN | | | | | -0.215 |
| | | | | | (0.511) |
| intra_pairIND | | | | | 0.526 |
| | | | | | (0.515) |
| intra_pairinter | | | | | 1.307* |
| | | | | | (0.563) |
| intra_pairIRL | | | | | -0.618 |
| | | | | | (0.488) |
| intra_pairIRN | | | | | 2.508** |
| | | | | | (0.611) |
| intra_pairISL | | | | | 3.182** |
| | | | | | (0.651) |
| intra_pairISR | | | | | 0.258 |
| | | | | | (0.560) |
| intra_pairITA | | | | | -1.018* |
| | | | | | (0.494) |
| intra_pairJOR | | | | | 3.802** |
| | | | | | (0.614) |
| intra_pairJPN | | | | | -1.773** |
| | | | | | (0.503) |
| intra_pairKEN | | | | | 3.734** |
| | | | | | (0.618) |
| intra_pairKOR | | | | | -0.875+ |
| | | | | | (0.523) |
| intra_pairKWT | | | | | 1.591** |
| | | | | | (0.559) |
| intra_pairLKA | | | | | 1.739** |
| | | | | | (0.549) |
| intra_pairMAC | | | | | 2.884** |
| | | | | | (0.903) |
| intra_pairMAR | | | | | 1.750* |
| | | | | | (0.744) |
| intra_pairMEX | | | | | -1.587** |
| | | | | | (0.514) |
| intra_pairMLT | | | | | 1.820** |
| | | | | | (0.569) |
| intra_pairMMR | | | | | 4.907** |
| | | | | | (0.513) |
| intra_pairMUS | | | | | 1.056 |
| | | | | | (0.760) |
| intra_pairMWI | | | | | 3.561** |
| | | | | | (0.764) |
| intra_pairMYS | | | | | -1.317* |
| | | | | | (0.549) |
| intra_pairNER | | | | | 2.538* |
| | | | | | (1.189) |
| intra_pairNGA | | | | | 3.593** |
| | | | | | (0.506) |
| intra_pairNLD | | | | | -0.891+ |
| | | | | | (0.504) |
| intra_pairNOR | | | | | 0.551 |
| | | | | | (0.532) |
| intra_pairNPL | | | | | 4.008** |
| | | | | | (0.845) |
| intra_pairPAN | | | | | 1.253 |
| | | | | | (0.902) |
| intra_pairPHL | | | | | -0.562 |
| | | | | | (0.525) |
| intra_pairPOL | | | | | 1.060+ |
| | | | | | (0.560) |
| intra_pairPRT | | | | | 0.140 |
| | | | | | (0.596) |
| intra_pairQAT | | | | | 2.020** |
| | | | | | (0.532) |
| intra_pairROM | | | | | 1.587** |
| | | | | | (0.578) |
| intra_pairSEN | | | | | 3.734** |
| | | | | | (0.958) |
| intra_pairSGP | | | | | -2.506** |
| | | | | | (0.484) |
| intra_pairSWE | | | | | -0.869+ |
| | | | | | (0.503) |
| intra_pairTHA | | | | | -0.999+ |
| | | | | | (0.522) |
| intra_pairTTO | | | | | 2.440** |
| | | | | | (0.522) |
| intra_pairTUN | | | | | 2.015** |
| | | | | | (0.740) |
| intra_pairTUR | | | | | 0.498 |
| | | | | | (0.533) |
| intra_pairTZA | | | | | 2.175** |
| | | | | | (0.707) |
| intra_pairURY | | | | | 1.618* |
| | | | | | (0.711) |
| intra_pairUSA | | | | | -3.413** |
| | | | | | (0.480) |
| intra_pairZAF | | | | | -0.399 |
| | | | | | (0.569) |
| | | | | | |
| Fixed effects | | | | | |
| exp_year | Yes | Yes | Yes | Yes | Yes |
| imp_year | Yes | Yes | Yes | Yes | Yes |
| | | | | | |
| N | 25,689 | 28,152 | 28,566 | 28,566 | 28,566 |
| Pseudo R-squared | 0.843 | 0.946 | 0.995 | 0.996 | 0.999 |
Standard errors in parenthesis
Significance levels: ** p < 0.01; * p < 0.05; + p < 0.10
2.3 Regional trade agreements effects
2.3.1 Preparing the data
This model specification includes gravity covariates, including importer-time and exporter-time fixed effects, as in the equation
\[ \begin{align} X_{ij,t} =& \:\exp\left[\pi_{i,t} + \chi_{i,t} + \beta_1 \log(DIST)_{i,j} + \beta_2 CNTG_{i,j} + \beta_3 LANG_{i,j} +\right.\\ \text{ }& \:\left.\beta_4 CLNY_{i,j} + \beta_5 RTA_{ij,t}\right] \times \varepsilon_{ij,t}. \end{align} \]
In comparison to the previous examples, we need to create additional variables to include fixed effects that account for the observations where the exporter and the importer are the same. These variables are internal border, internal dyad and internal borders for different years.
The direct way of obtaining the desired variables is similar to what we did in the previous sections.
ch1_application3 <- agtpa_applications |>
filter(year %in% seq(1986, 2006, 4)) |>
mutate(
exp_year = paste0(exporter, year),
imp_year = paste0(importer, year),
year = paste0("intl_border_", year),
log_trade = log(trade),
log_dist = log(dist),
intl_brdr = ifelse(exporter == importer, pair_id, "inter"),
intl_brdr_2 = ifelse(exporter == importer, 0, 1),
pair_id_2 = ifelse(exporter == importer, "0-intra", pair_id)
) |>
pivot_wider(names_from = year, values_from = intl_brdr_2, values_fill = 0)Notice that we used “0-intra” and not just “intra” because the rest of the observations in the internal dyads are numbers 1, …, N, and R internals shall consider “0-intra” as the reference factor for being the first item when it orders the unique observations alphabetically. Also, observe the order of the resulting table, the pivoting of the table will put “0-intra” as the first row for the first exporter-importer dyad. This makes the difference between the expected or other behaviours in the next chapter.
In addition, we need to create the variable containing the trade sum to filter the cases where the sum by dyad is zero.
ch1_application3 <- ch1_application3 |>
group_by(pair_id) |>
mutate(sum_trade = sum(trade)) |>
ungroup()2.3.2 OLS standard RTA estimates with international trade only
The gravity specification, which includes \(\pi_{i,t} + \chi_{i,t}\), means that we need to do something very similar to what we did in the last section.
With the data from above, the model specification is straightforward.
fit_ols_rta <- felm(
log_trade ~ log_dist + cntg + lang + clny + rta | exp_year + imp_year | pair_id,
data = filter(ch1_application3, trade > 0, importer != exporter)
)
fit_ols_rtaFormula: log_trade ~ log_dist + cntg + lang + clny + rta | exp_year +
imp_year | pair_id
Estimates:
| | Estimate | Std. Error | z value | Pr(>|z|) |
|----------|----------|------------|----------|-----------|
| log_dist | -1.2159 | 0.0384 | -31.6974 | 0.0000 ** |
| cntg | 0.2230 | 0.1996 | 1.1173 | 0.2638 |
| lang | 0.6607 | 0.0808 | 8.1785 | 0.0000 ** |
| clny | 0.6705 | 0.1470 | 4.5614 | 0.0000 ** |
| rta | -0.0044 | 0.0531 | -0.0827 | 0.9341 |
Significance codes: ** p < 0.01; * p < 0.05; + p < 0.10
R-squared : 0.8432
Adj. R-squared: 0.838
Number of observations: Full 25689; Missing 0; Perfect classification 0
2.3.3 PPML standard RTA estimates with international trade only
The model specification is very similar to OLS, and we only need to change the method specified in the function.
fit_ppml_rta <- fepoisson(
trade ~ log_dist + cntg + lang + clny + rta | exp_year + imp_year | pair_id,
data = filter(ch1_application3, importer != exporter)
)
fit_ppml_rtaFormula: trade ~ log_dist + cntg + lang + clny + rta | exp_year + imp_year |
pair_id
Family: Poisson
Estimates:
| | Estimate | Std. Error | z value | Pr(>|z|) |
|----------|----------|------------|----------|-----------|
| log_dist | -0.8216 | 0.0310 | -26.5189 | 0.0000 ** |
| cntg | 0.4155 | 0.0828 | 5.0189 | 0.0000 ** |
| lang | 0.2499 | 0.0767 | 3.2591 | 0.0011 ** |
| clny | -0.2054 | 0.1143 | -1.7972 | 0.0723 + |
| rta | 0.1907 | 0.0658 | 2.8986 | 0.0037 ** |
Significance codes: ** p < 0.01; * p < 0.05; + p < 0.10
Pseudo R-squared: 0.946
Number of observations: Full 28152; Missing 0; Perfect classification 0
Number of Fisher Scoring iterations: 12
2.3.4 Addressing potential domestic trade diversion
The model specification is quite the same as PPML. We only need to add the international border variable but use the entire dataset instead of removing rows where the importer and the exporter are the same.
fit_intra_rta <- fepoisson(
trade ~ log_dist + cntg + lang + clny + rta | exp_year + imp_year +
intl_brdr | pair_id,
data = ch1_application3
)
fit_intra_rtaFormula: trade ~ log_dist + cntg + lang + clny + rta | exp_year + imp_year +
intl_brdr | pair_id
Family: Poisson
Estimates:
| | Estimate | Std. Error | z value | Pr(>|z|) |
|----------|----------|------------|----------|-----------|
| log_dist | -0.8003 | 0.0303 | -26.4446 | 0.0000 ** |
| cntg | 0.3932 | 0.0789 | 4.9853 | 0.0000 ** |
| lang | 0.2437 | 0.0771 | 3.1615 | 0.0016 ** |
| clny | -0.1822 | 0.1134 | -1.6061 | 0.1083 |
| rta | 0.4085 | 0.0688 | 5.9355 | 0.0000 ** |
Significance codes: ** p < 0.01; * p < 0.05; + p < 0.10
Pseudo R-squared: 0.9984
Number of observations: Full 28566; Missing 0; Perfect classification 0
Number of Fisher Scoring iterations: 13
2.3.5 Addressing potential endogeneity of RTAs
The model specification includes the RTA variable and the exporter-time, importer-time and internal dyad fixed effects to account for domestic trade.
fit_endg_rta <- fepoisson(
trade ~ rta | exp_year + imp_year + pair_id_2 | pair_id,
data = filter(ch1_application3, sum_trade > 0)
)
fit_endg_rtaFormula: trade ~ rta | exp_year + imp_year + pair_id_2 | pair_id
Family: Poisson
Estimates:
| | Estimate | Std. Error | z value | Pr(>|z|) |
|-----|----------|------------|---------|-----------|
| rta | 0.5572 | 0.1022 | 5.4500 | 0.0000 ** |
Significance codes: ** p < 0.01; * p < 0.05; + p < 0.10
Pseudo R-squared: 0.9988
Number of observations: Full 28482; Missing 0; Perfect classification 0
Number of Fisher Scoring iterations: 20
2.3.6 Testing for potential “reverse causality” between trade and RTAs
We need to modify the previous model to include the forward lagged RTA variable (by four years) and consider where the trade sum is larger than zero.
fit_lead_rta <- fepoisson(
trade ~ rta + rta_lead4 | exp_year + imp_year + pair_id_2 | pair_id,
data = filter(ch1_application3, sum_trade > 0)
)
fit_lead_rtaFormula: trade ~ rta + rta_lead4 | exp_year + imp_year + pair_id_2 | pair_id
Family: Poisson
Estimates:
| | Estimate | Std. Error | z value | Pr(>|z|) |
|-----------|----------|------------|---------|-----------|
| rta | 0.5200 | 0.0859 | 6.0553 | 0.0000 ** |
| rta_lead4 | 0.0774 | 0.0921 | 0.8405 | 0.4006 |
Significance codes: ** p < 0.01; * p < 0.05; + p < 0.10
Pseudo R-squared: 0.9988
Number of observations: Full 28482; Missing 0; Perfect classification 0
Number of Fisher Scoring iterations: 20
beta_rta <- fit_lead_rta$coef_table[, 1]
beta_sum <- sum(beta_rta)
beta_form <- paste(paste0("x", seq_along(beta_rta)), collapse = "+")
beta_form <- paste0("~", beta_form)
beta_std_err <- deltamethod(as.formula(beta_form), beta_rta, fit_lead_rta$vcov)
beta_tstat <- beta_sum / beta_std_err
beta_pval <- pnorm(-abs(beta_tstat)) + (1 - pnorm(abs(beta_tstat)))
round(c(
"sum_coef_rta" = beta_sum,
"std_err" = beta_std_err,
"p-value" = beta_pval
), 4)sum_coef_rta std_err p-value
0.5975 0.1380 0.0000
2.3.7 Addressing potential non-linear and phasing-in effects of RTAs
Instead of future-lagged RTA variables, as in the previous model, we modify the previous model and include the RTA backwards lagged variables instead.
fit_phsng_rta <- fepoisson(
trade ~ rta + rta_lag4 + rta_lag8 + rta_lag12 | exp_year + imp_year + pair_id_2 | pair_id,
data = filter(ch1_application3, sum_trade > 0)
)
fit_phsng_rtaFormula: trade ~ rta + rta_lag4 + rta_lag8 + rta_lag12 | exp_year + imp_year +
pair_id_2 | pair_id
Family: Poisson
Estimates:
| | Estimate | Std. Error | z value | Pr(>|z|) |
|-----------|----------|------------|---------|-----------|
| rta | 0.2914 | 0.0892 | 3.2680 | 0.0011 ** |
| rta_lag4 | 0.4135 | 0.0672 | 6.1498 | 0.0000 ** |
| rta_lag8 | 0.1687 | 0.0431 | 3.9117 | 0.0001 ** |
| rta_lag12 | 0.1189 | 0.0301 | 3.9552 | 0.0001 ** |
Significance codes: ** p < 0.01; * p < 0.05; + p < 0.10
Pseudo R-squared: 0.999
Number of observations: Full 28482; Missing 0; Perfect classification 0
Number of Fisher Scoring iterations: 20
beta_rta <- fit_phsng_rta$coef_table[, 1]
beta_sum <- sum(beta_rta)
beta_form <- paste(paste0("x", seq_along(beta_rta)), collapse = "+")
beta_form <- paste0("~", beta_form)
beta_std_err <- deltamethod(as.formula(beta_form), beta_rta, fit_phsng_rta$vcov)
beta_tstat <- beta_sum / beta_std_err
beta_pval <- pnorm(-abs(beta_tstat)) + (1 - pnorm(abs(beta_tstat)))
round(c(
"sum_coef_rta" = beta_sum,
"std_err" = beta_std_err,
"p-value" = beta_pval
), 4)sum_coef_rta std_err p-value
0.9926 0.0943 0.0000
2.3.8 Addressing globalization effects
In addition to the previous model, we include the international borders on different years besides the lagged RTAs.
fit_glbzn_rta <- fepoisson(
trade ~ rta + rta_lag4 + rta_lag8 + rta_lag12 + intl_border_1986 +
intl_border_1990 + intl_border_1994 + intl_border_1998 + intl_border_2002 |
exp_year + imp_year + pair_id_2 | pair_id,
data = filter(ch1_application3, sum_trade > 0)
)
fit_glbzn_rtaFormula: trade ~ rta + rta_lag4 + rta_lag8 + rta_lag12 + intl_border_1986 +
intl_border_1990 + intl_border_1994 + intl_border_1998 +
intl_border_2002 | exp_year + imp_year + pair_id_2 | pair_id
Family: Poisson
Estimates:
| | Estimate | Std. Error | z value | Pr(>|z|) |
|------------------|----------|------------|----------|-----------|
| rta | 0.1157 | 0.0867 | 1.3342 | 0.1822 |
| rta_lag4 | 0.2877 | 0.0617 | 4.6666 | 0.0000 ** |
| rta_lag8 | 0.0693 | 0.0482 | 1.4382 | 0.1504 |
| rta_lag12 | 0.0024 | 0.0291 | 0.0809 | 0.9355 |
| intl_border_1986 | -0.7056 | 0.0478 | -14.7642 | 0.0000 ** |
| intl_border_1990 | -0.4803 | 0.0429 | -11.1830 | 0.0000 ** |
| intl_border_1994 | -0.3665 | 0.0335 | -10.9550 | 0.0000 ** |
| intl_border_1998 | -0.1581 | 0.0233 | -6.7950 | 0.0000 ** |
| intl_border_2002 | -0.1409 | 0.0169 | -8.3453 | 0.0000 ** |
Significance codes: ** p < 0.01; * p < 0.05; + p < 0.10
Pseudo R-squared: 0.9996
Number of observations: Full 28482; Missing 0; Perfect classification 0
Number of Fisher Scoring iterations: 20
beta_rta <- fit_glbzn_rta$coef_table[, 1][grep("rta", rownames(fit_glbzn_rta$coef_table))]
beta_sum <- sum(beta_rta)
beta_form <- paste(paste0("x", seq_along(beta_rta)), collapse = "+")
beta_form <- paste0("~", beta_form)
beta_std_err <- deltamethod(as.formula(beta_form), beta_rta, fit_glbzn_rta$vcov[names(beta_rta), names(beta_rta)])
beta_tstat <- beta_sum / beta_std_err
beta_pval <- pnorm(-abs(beta_tstat)) + (1 - pnorm(abs(beta_tstat)))
round(c(
"sum_coef_rta" = beta_sum,
"std_err" = beta_std_err,
"p-value" = beta_pval
), 4)sum_coef_rta std_err p-value
0.4750 0.1095 0.0000
2.3.9 Presenting all models together
As it was done in the previous sections, we can use the summary_table() function to show all the models fitted in this section.
summary_table(
fit_ols_rta,
fit_ppml_rta,
fit_intra_rta,
fit_endg_rta,
fit_lead_rta,
fit_phsng_rta,
fit_glbzn_rta
)| Variable | (1) | (2) | (3) | (4) | (5) | (6) | (7) |
|--------------------|--------------------|--------------------|--------------------|-------------------|-------------------|-------------------|--------------------|
| log_dist | -1.216** | -0.822** | -0.800** | | | | |
| | (0.038) | (0.031) | (0.030) | | | | |
| cntg | 0.223 | 0.416** | 0.393** | | | | |
| | (0.200) | (0.083) | (0.079) | | | | |
| lang | 0.661** | 0.250** | 0.244** | | | | |
| | (0.081) | (0.077) | (0.077) | | | | |
| clny | 0.670** | -0.205+ | -0.182 | | | | |
| | (0.147) | (0.114) | (0.113) | | | | |
| rta | -0.004 | 0.191** | 0.409** | 0.557** | 0.520** | 0.291** | 0.116 |
| | (0.053) | (0.066) | (0.069) | (0.102) | (0.086) | (0.089) | (0.087) |
| rta_lead4 | | | | | 0.077 | | |
| | | | | | (0.092) | | |
| rta_lag4 | | | | | | 0.414** | 0.288** |
| | | | | | | (0.067) | (0.062) |
| rta_lag8 | | | | | | 0.169** | 0.069 |
| | | | | | | (0.043) | (0.048) |
| rta_lag12 | | | | | | 0.119** | 0.002 |
| | | | | | | (0.030) | (0.029) |
| intl_border_1986 | | | | | | | -0.706** |
| | | | | | | | (0.048) |
| intl_border_1990 | | | | | | | -0.480** |
| | | | | | | | (0.043) |
| intl_border_1994 | | | | | | | -0.367** |
| | | | | | | | (0.033) |
| intl_border_1998 | | | | | | | -0.158** |
| | | | | | | | (0.023) |
| intl_border_2002 | | | | | | | -0.141** |
| | | | | | | | (0.017) |
| | | | | | | | |
| Fixed effects | | | | | | | |
| exp_year | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| imp_year | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| intl_brdr | No | No | Yes | No | No | No | No |
| pair_id_2 | No | No | No | Yes | Yes | Yes | Yes |
| | | | | | | | |
| N | 25,689 | 28,152 | 28,566 | 28,482 | 28,482 | 28,482 | 28,482 |
| Pseudo R-squared | 0.843 | 0.946 | 0.998 | 0.999 | 0.999 | 0.999 | 1.000 |
Standard errors in parenthesis
Significance levels: ** p < 0.01; * p < 0.05; + p < 0.10