fit_control {capybara}R Documentation

Set feglm Control Parameters

Description

Set and change parameters used for fitting feglm, felm, and fenegbin. Termination conditions are similar to glm.

Usage

fit_control(
  dev_tol = 1e-06,
  center_tol = 1e-06,
  collin_tol = 1e-08,
  step_halving_factor = 0.5,
  alpha_tol = 1e-05,
  iter_max = 50L,
  iter_center_max = 10000L,
  iter_inner_max = 50L,
  iter_alpha_max = 10000L,
  step_halving_memory = 0.9,
  max_step_halving = 2L,
  start_inner_tol = 1e-05,
  grand_acc_period = 4L,
  centering = "berge",
  sep_tol = 1e-08,
  sep_zero_tol = 1e-08,
  sep_mu_tol = 1e-06,
  sep_max_iter = 200L,
  sep_simplex_max_iter = 2000L,
  sep_use_relu = TRUE,
  sep_use_simplex = TRUE,
  sep_use_mu = TRUE,
  return_fe = TRUE,
  keep_tx = FALSE,
  keep_data = FALSE,
  return_hessian = FALSE,
  check_separation = TRUE,
  init_theta = 0,
  vcov_type = NULL,
  expectile = NULL,
  expectile_tol = 1e-12,
  expectile_iter_max = 50L,
  expectile_glm_iter_max = NULL,
  expectile_trace = FALSE
)

Arguments

dev_tol tolerance level for the first stopping condition of the maximization routine. The stopping condition is based on the relative change of the deviance in iteration r and can be expressed as follows: |dev_{r} - dev_{r - 1}| / (0.1 + |dev_{r}|) < tol. The default is 1.0e-06.
center_tol tolerance level for the stopping condition of the centering algorithm. The stopping condition is based on the relative change of the centered variable similar to the 'lfe' package. The default is 1.0e-05.
collin_tol tolerance level for detecting collinearity. The default is 1.0e-08.
step_halving_factor numeric indicating the factor by which the step size is halved to iterate towards convergence. This is used to control the step size during optimization. The default is 0.5.
alpha_tol tolerance for fixed effects (alpha) convergence. The default is 1.0e-05.
iter_max integer indicating the maximum number of iterations in the maximization routine. The default is 50L.
iter_center_max integer indicating the maximum number of iterations in the centering algorithm. The default is 10000L.
iter_inner_max integer indicating the maximum number of iterations in the inner loop of the centering algorithm. The default is 50L.
iter_alpha_max maximum iterations for fixed effects computation. The default is 10000L.
step_halving_memory numeric memory factor for step-halving algorithm. Controls how much of the previous iteration is retained. The default is 0.9.
max_step_halving maximum number of post-convergence step-halving attempts. The default is 2.
start_inner_tol starting tolerance for inner solver iterations. The default is 1.0e-05.
grand_acc_period integer indicating the period (in iterations) for grand acceleration in the centering algorithm. Grand acceleration applies a second-level Irons-Tuck extrapolation on the overall convergence trajectory. Lower values (e.g., 4-10) may speed up convergence for difficult problems. Set to a very large value (e.g., 10000) to effectively disable. The default is 4L.
centering character string indicating the centering algorithm to use for demeaning fixed effects. "stammann" (default) uses alternating projections with Gauss-Seidel sweeps plus Irons-Tuck and grand acceleration on coefficient vectors. Each iteration updates each fixed-effect dimension in sequence. "berge" uses a fixed-point reformulation as described in Berge (2018): all FE updates are composed into a single map F = f_T \circ f_I, reducing the problem to finding \beta^* = F(\beta^*). The Irons and Tuck (1969) acceleration is then applied to the composed iteration. Both methods use warm-starting and grand acceleration.
sep_tol tolerance for separation detection. The default is 1.0e-08.
sep_zero_tol tolerance for treating values as zero in separation detection. The default is 1.0e-08.
sep_mu_tol tolerance for mu-based separation detection during IRLS iterations. Observations with y == 0 and eta <= log(sep_mu_tol) are flagged as separated. Based on ppmlhdfe's mu separation method. The default is 1.0e-06.
sep_max_iter maximum iterations for ReLU separation detection algorithm. The default is 200L.
sep_simplex_max_iter maximum iterations for simplex separation detection algorithm. The default is 2000L.
sep_use_relu logical indicating whether to use the ReLU algorithm for separation detection. The default is TRUE.
sep_use_simplex logical indicating whether to use the simplex algorithm for separation detection. The default is TRUE.
sep_use_mu logical indicating whether to use mu-based separation detection during IRLS iterations. This catches observations where predicted values become extremely small (suggesting perfect prediction of zeros). Following ppmlhdfe methodology. The default is TRUE.
return_fe logical indicating if the fixed effects should be returned. This can be useful when fitting general equilibrium models where skipping the fixed effects for intermediate steps speeds up computation. Set it to FALSE to minimize memory usage. The default is TRUE.
keep_tx logical indicating if the centered regressor matrix should be stored. The default is FALSE to minimize memory usage.
keep_data logical indicating if the filtered data should be stored in the result object. Required for predict() methods. Set to TRUE when planning to use prediction functions. The default is FALSE to minimize memory usage for production/benchmark use.
return_hessian logical indicating if the Hessian matrix should be returned. The Hessian is a P*P matrix used to compute the variance-covariance matrix. The default is FALSE to minimize memory usage (vcov is still computed and returned).
check_separation logical indicating whether to perform separation detection for Poisson models. When TRUE (default), observations with perfect prediction are automatically detected and excluded from estimation. Set to FALSE to skip this check and speed up computation when separation is known not to be an issue. The default is TRUE.
init_theta Initial value for the negative binomial dispersion parameter (theta). The default is 0.0.
vcov_type Optional character string specifying the type of variance-covariance estimator to be used. When NULL (default), the covariance matrix is the inverse Hessian (IID) when no cluster variable is present, or a clustered sandwich when one is. Other values: "hetero" - heteroskedastic-robust HC0 sandwich (no cluster variable needed); "m-estimator" - one-way M-estimator sandwich (cluster variable required); "m-estimator-dyadic" - dyadic-robust Cameron-Miller sandwich (two entity columns required in the third part of the formula like z ~ x + y | fe | cl1 + cl2).
expectile numeric value between 0 and 1 (exclusive) specifying the expectile for asymmetric Poisson pseudo-maximum likelihood (APPML) estimation. When NULL (default), standard symmetric estimation is performed. Values below 0.5 give more weight to negative residuals (lower quantiles), while values above 0.5 give more weight to positive residuals (upper quantiles). For example, expectile = 0.1 estimates the 10th expectile, expectile = 0.5 is equivalent to standard Poisson PML, and expectile = 0.9 estimates the 90th expectile.
expectile_tol tolerance level for the stopping condition of the expectile iteration algorithm.

The convergence criterion uses a hybrid approach: the algorithm converges when the squared norm of coefficient changes \|b - b_{old}\|^2 is below a threshold computed as the maximum of an absolute floor (1e-14) and a relative tolerance scaled by the L2 norm of coefficients. Specifically: \text{threshold} = \max(10^{-14}, (\text{tol} \cdot \|b\|_2)^2). This hybrid criterion ensures numerical robustness across different compiler optimizations (e.g., FMA on macOS) and handles both cases with small and large coefficients appropriately. The default is 1.0e-12.

expectile_iter_max integer indicating the maximum number of iterations for the expectile reweighting algorithm. The default is 50L.
expectile_glm_iter_max integer indicating the maximum number of inner GLM (IRLS) iterations per APPML outer iteration. When NULL (default), the value of iter_max is used, meaning the inner GLM runs to full convergence before APPML weights are updated. Setting this to 1L enables a single-step mode where APPML asymmetric weights are updated after every Newton step, which typically reduces the total number of iterations needed.
expectile_trace logical indicating whether to print iteration information during expectile estimation. The default is FALSE.

Value

A named list of control parameters.

Memory-Efficient vs Full Models

By default, fit_control() returns "thin" model objects with minimal memory footprint:

  • keep_data = FALSE - data not stored

  • return_fe = FALSE - fixed effects not stored

  • return_hessian = FALSE - Hessian not stored

  • keep_tx = FALSE - centered matrix not stored

See Also

feglm, felm, and fenegbin

Examples

ross2004_subset <- ross2004[ross2004$year == 1999, ]
ross2004_subset <- ross2004_subset[ross2004_subset$ltrade >
  quantile(ross2004_subset$ltrade, 0.75), ]

felm(ltrade ~ ldist | ctry1, ross2004_subset,
  control = fit_control(dev_tol = 1e-10, center_tol = 1e-10)
)

Loading...