Poisson Pseudo-Maximum Likelihood (PPML) Model with Cluster-Robust Standard Errors • capybara

We will estimate a Poisson Pseudo-Maximum Likelihood (PPML) model using the data available in this package with the idea of replicating the PPML results from Table 3 in Yotov et al. (2016).

This requires to include exporter-time and importer-time fixed effects, and to cluster the standard errors by exporter-importer pairs.

The PPML especification corresponds to: $\begin{align} X_{ij,t} =& \:\exp\left[\beta_1 \log(DIST)_{i,j} + \beta_2 CNTG_{i,j} +\right.\\ \text{ }& \:\left.\beta_3 LANG_{i,j} + \beta_4 CLNY_{i,j} + \pi_{i,t} + \chi_{i,t}\right] \times \varepsilon_{ij,t}. \end{align}$

We use dplyr to obtain the log of the distance. This model excludes domestic flows, therefore we need to subset the data also with dplyr.

Required packages:

library(capybara)

We can use the fepoisson() function to obtain the estimated coefficients and we add the fixed effects as | exp_year + imp_year in the formula.

Model estimation:

fit <- fepoisson(
  trade ~ log_dist + cntg + lang + clny + rta | exp_year + imp_year,
  data = trade_panel
)

summary(fit)
#> Formula: trade ~ log_dist + cntg + lang + clny + rta | exp_year + imp_year
#> 
#> Family: Poisson
#> 
#> Estimates:
#> 
#> |          | Estimate | Std. Error | z value    | Pr(>|z|)   |
#> |----------|----------|------------|------------|------------|
#> | log_dist |  -0.8216 |     0.0004 | -2194.0448 | 0.0000 *** |
#> | cntg     |   0.4155 |     0.0009 |   476.0613 | 0.0000 *** |
#> | lang     |   0.2499 |     0.0008 |   296.8884 | 0.0000 *** |
#> | clny     |  -0.2054 |     0.0010 |  -206.3476 | 0.0000 *** |
#> | rta      |   0.1907 |     0.0010 |   191.0964 | 0.0000 *** |
#> 
#> Significance codes: *** 99.9%; ** 99%; * 95%; . 90%
#> 
#> Pseudo R-squared: 0.587 
#> 
#> Number of observations: Full 28152; Missing 0; Perfect classification 0 
#> 
#> Number of Fisher Scoring iterations: 11

The coefficients are almost identical to those in Table 3 from Yotov et al. (2016) that were obtained with Stata. The difference is attributed to the different fitting algorithms used by the software. Capybara uses the demeaning algorithm proposed by Stammann (2018).

fit <- fepoisson(
  trade ~ log_dist + cntg + lang + clny + rta | exp_year + imp_year | pair,
  data = trade_panel
)

summary(fit, type = "clustered")
#> Formula: trade ~ log_dist + cntg + lang + clny + rta | exp_year + imp_year | 
#>     pair
#> 
#> Family: Poisson
#> 
#> Estimates:
#> 
#> |          | Estimate | Std. Error | z value | Pr(>|z|)   |
#> |----------|----------|------------|---------|------------|
#> | log_dist |  -0.8216 |     0.1567 | -5.2437 | 0.0000 *** |
#> | cntg     |   0.4155 |     0.4568 |  0.9097 | 0.3630     |
#> | lang     |   0.2499 |     0.3997 |  0.6252 | 0.5319     |
#> | clny     |  -0.2054 |     0.3287 | -0.6250 | 0.5320     |
#> | rta      |   0.1907 |     0.7657 |  0.2491 | 0.8033     |
#> 
#> Significance codes: *** 99.9%; ** 99%; * 95%; . 90%
#> 
#> Pseudo R-squared: 0.587 
#> 
#> Number of observations: Full 28152; Missing 0; Perfect classification 0 
#> 
#> Number of Fisher Scoring iterations: 11

The result is similar and the numerical difference comes fom the variance-covariance matrix estimation method. Capybara clustering algorithm is based on Cameron, Gelbach, and Miller (2011).

References

Cameron, A Colin, Jonah B Gelbach, and Douglas L Miller. 2011. “Robust Inference with Multiway Clustering.” Journal of Business & Economic Statistics 29 (2): 238–49.

Stammann, Amrei. 2018. “Fast and Feasible Estimation of Generalized Linear Models with High-Dimensional K-Way Fixed Effects.” arXiv. https://doi.org/10.48550/arXiv.1707.01815.

Yotov, Yoto V, Roberta Piermartini, Mario Larch, and others. 2016. An Advanced Guide to Trade Policy Analysis: The Structural Gravity Model. WTO iLibrary.