Skip to contents

feglm can be used to fit generalized linear models with many high-dimensional fixed effects. The estimation procedure is based on unconditional maximum likelihood and can be interpreted as a “weighted demeaning” approach.

Remark: The term fixed effect is used in econometrician's sense of having intercepts for each level in each category.

Usage

feglm(
  formula = NULL,
  data = NULL,
  family = gaussian(),
  weights = NULL,
  beta_start = NULL,
  eta_start = NULL,
  control = NULL
)

Arguments

formula

an object of class "formula": a symbolic description of the model to be fitted. formula must be of type y ~ x | k, where the second part of the formula refers to factors to be concentrated out. It is also possible to pass clustering variables to feglm as y ~ x | k | c.

data

an object of class "data.frame" containing the variables in the model. The expected input is a dataset with the variables specified in formula and a number of rows at least equal to the number of variables in the model.

family

the link function to be used in the model. Similar to glm.fit this has to be the result of a call to a family function. Default is gaussian(). See family for details of family functions.

weights

an optional string with the name of the 'prior weights' variable in data.

beta_start

an optional vector of starting values for the structural parameters in the linear predictor. Default is \(\boldsymbol{\beta} = \mathbf{0}\).

eta_start

an optional vector of starting values for the linear predictor.

control

a named list of parameters for controlling the fitting process. See feglm_control for details.

Value

A named list of class "feglm". The list contains the following fifteen elements:

coefficients

a named vector of the estimated coefficients

eta

a vector of the linear predictor

weights

a vector of the weights used in the estimation

hessian

a matrix with the numerical second derivatives

deviance

the deviance of the model

null_deviance

the null deviance of the model

conv

a logical indicating whether the model converged

iter

the number of iterations needed to converge

nobs

a named vector with the number of observations used in the estimation indicating the dropped and perfectly predicted observations

lvls_k

a named vector with the number of levels in each fixed effects

nms_fe

a list with the names of the fixed effects variables

formula

the formula used in the model

data

the data used in the model after dropping non-contributing observations

family

the family used in the model

control

the control list used in the model

Details

If feglm does not converge this is often a sign of linear dependence between one or more regressors and a fixed effects category. In this case, you should carefully inspect your model specification.

References

Gaure, S. (2013). "OLS with Multiple High Dimensional Category Variables". Computational Statistics and Data Analysis, 66.

Marschner, I. (2011). "glm2: Fitting generalized linear models with convergence problems". The R Journal, 3(2).

Stammann, A., F. Heiss, and D. McFadden (2016). "Estimating Fixed Effects Logit Models with Large Panel Data". Working paper.

Stammann, A. (2018). "Fast and Feasible Estimation of Generalized Linear Models with High-Dimensional k-Way Fixed Effects". ArXiv e-prints.

Examples

# subset trade flows to avoid fitting time warnings during check
set.seed(123)
trade_2006 <- trade_panel[trade_panel$year == 2006, ]
trade_2006 <- trade_2006[sample(nrow(trade_2006), 500), ]

mod <- feglm(
  trade ~ log_dist + lang + cntg + clny | exp_year + imp_year,
  trade_2006,
  family = poisson(link = "log")
)

summary(mod)
#> Formula: trade ~ log_dist + lang + cntg + clny | exp_year + imp_year
#> <environment: 0x56808af01958>
#> 
#> Family: Poisson
#> 
#> Estimates:
#> 
#> |          | Estimate | Std. Error | z value   | Pr(>|z|)   |
#> |----------|----------|------------|-----------|------------|
#> | log_dist |  -0.7937 |     0.0049 | -162.0661 | 0.0000 *** |
#> | lang     |   0.0491 |     0.0103 |    4.7808 | 0.0000 *** |
#> | cntg     |   0.6913 |     0.0113 |   61.3261 | 0.0000 *** |
#> | clny     |  -0.0239 |     0.0109 |   -2.1871 | 0.0287 *   |
#> 
#> Significance codes: *** 99.9%; ** 99%; * 95%; . 90%
#> 
#> Pseudo R-squared: 0.6274 
#> 
#> Number of observations: Full 500; Missing 0; Perfect classification 0 
#> 
#> Number of Fisher Scoring iterations: 12 

mod <- feglm(
  trade ~ log_dist + lang + cntg + clny | exp_year + imp_year | pair,
  trade_panel,
  family = poisson(link = "log")
)

summary(mod, type = "clustered")
#> Formula: trade ~ log_dist + lang + cntg + clny | exp_year + imp_year | 
#>     pair
#> <environment: 0x56808af01958>
#> 
#> Family: Poisson
#> 
#> Estimates:
#> 
#> |          | Estimate | Std. Error | z value | Pr(>|z|)   |
#> |----------|----------|------------|---------|------------|
#> | log_dist |  -0.8409 |     0.1572 | -5.3486 | 0.0000 *** |
#> | lang     |   0.2475 |     0.3985 |  0.6211 | 0.5345     |
#> | cntg     |   0.4374 |     0.4985 |  0.8776 | 0.3802     |
#> | clny     |  -0.2225 |     0.3386 | -0.6570 | 0.5112     |
#> 
#> Significance codes: *** 99.9%; ** 99%; * 95%; . 90%
#> 
#> Pseudo R-squared: 0.586 
#> 
#> Number of observations: Full 28152; Missing 0; Perfect classification 0 
#> 
#> Number of Fisher Scoring iterations: 11