feglm {capybara}R Documentation

GLM fitting with high-dimensional k-way fixed effects

Description

feglm can be used to fit generalized linear models with many high-dimensional fixed effects. The term fixed effect means having one intercept for each level in each category.

Usage

feglm(
  formula = NULL,
  data = NULL,
  family = gaussian(),
  weights = NULL,
  beta_start = NULL,
  eta_start = NULL,
  offset = NULL,
  control = NULL
)

Arguments

formula an object of class "formula": a symbolic description of the model to be fitted. formula must be of type response ~ slopes | fixed_effects | cluster.
data an object of class "data.frame" containing the variables in the model. The expected input is a dataset with the variables specified in formula and a number of rows at least equal to the number of variables in the model.
family the link function to be used in the model. Similar to glm.fit this has to be the result of a call to a family function. Default is gaussian(). See family for details of family functions.
weights an optional string with the name of the prior weights variable in data.
beta_start an optional vector of starting values for the structural parameters in the linear predictor. Default is \boldsymbol{\beta} = \mathbf{0}.
eta_start an optional vector of starting values for the linear predictor.
offset an optional formula or numeric vector specifying an a priori known component to be included in the linear predictor. If a formula, it should be of the form ~ log(variable).
control a named list of parameters for controlling the fitting process. See fit_control for details.

Details

If feglm does not converge this is often a sign of linear dependence between one or more regressors and a fixed effects category. In this case, you should carefully inspect your model specification.

Value

A named list of class "feglm". The list contains the following fifteen elements:

coefficients a named vector of the estimated coefficients
eta a vector of the linear predictor
weights a vector of the weights used in the estimation
hessian a matrix with the numerical second derivatives
deviance the deviance of the model
null_deviance the null deviance of the model
conv a logical indicating whether the model converged
iter the number of iterations needed to converge
nobs a named vector with the number of observations used in the estimation indicating the dropped and perfectly predicted observations
fe_levels a named vector with the number of levels in each fixed effects
nms_fe a list with the names of the fixed effects variables
formula the formula used in the model
data the data used in the model after dropping non-contributing observations
family the family used in the model
control the control list used in the model

References

Gaure, S. (2013). "OLS with Multiple High Dimensional Category Variables". Computational Statistics and Data Analysis, 66.

Marschner, I. (2011). "glm2: Fitting generalized linear models with convergence problems". The R Journal, 3(2).

Stammann, A., F. Heiss, and D. McFadden (2016). "Estimating Fixed Effects Logit Models with Large Panel Data". Working paper.

Stammann, A. (2018). "Fast and Feasible Estimation of Generalized Linear Models with High-Dimensional k-Way Fixed Effects". ArXiv e-prints.

Examples

# Model without clustering - uses inverse Hessian for vcov
mod <- feglm(mpg ~ wt | cyl, mtcars, family = poisson(link = "log"))
summary(mod)

# Model with clustering - uses sandwich vcov automatically
mod <- feglm(mpg ~ wt | cyl | am, mtcars, family = poisson(link = "log"))
summary(mod)

Loading...