The Hitchhiker’s Guide to Linear Models

Some minor corrections.
Author

Mauricio “Pachá” Vargas S.

Published

June 8, 2024

About

The Hitchhiker’s Guide to Linear Models can be downloaded for free (or for a suggested price of 10 USD) from Leanpub.

For every exercise I did my best to connect the specific statistical concepts with R code, and every time I use linear algebra I connect it with a concrete R example. In this book you will not find something such as “this is left as an exercise to the reader”.

This books contains no proofs. I tried to replace those with multiple examples consisting in analyzing my own experiments, such as throwing a tennis ball and measuring the time it takes to hit the ground from different heights, and another where I got two thermometers and measured the temperature outside a building at the same time of the day on different days.

Table of contents

  1. Preface
  2. R Setup
    1. R and RStudio
      1. Windows and Mac
      2. Linux
    2. Installing R
      1. Windows and Mac
      2. Linux
    3. Installing RStudio
      1. Windows and Mac
      2. Linux
    4. Installing R Packages
      1. Windows and Mac
      2. Linux
    5. Changing RStudio colors and font
      1. Windows and Mac
      2. Linux
    6. Installing Quarto
      1. Windows and Mac
      2. Linux
  3. Linear algebra review
    1. Using R as a calculator
    2. System of linear equations
    3. Matrix
    4. Transpose matrix
    5. Matrix multiplication
    6. Matrix representation of a system of linear equations
    7. Identity matrix
    8. Inverse matrix
    9. Solving systems of linear equations
  4. Statistics review
    1. Using R as a calculator
      1. Mean
      2. Variance
      3. Standard deviation
      4. Covariance
      5. Correlation
      6. Normal distribution
      7. Poisson distribution
      8. Student’s t-distribution
      9. Computing probabilities with the normal distribution
      10. Computing probabilities with the Poisson distribution
      11. Computing probabilities with the t-distribution
    2. Data and dataset
      1. Mean
      2. Variance
      3. Standard deviation
      4. Covariance
      5. Correlation
      6. Normal distribution
      7. Poisson distribution
      8. Student’s t-distribution
      9. Computing probabilities with the normal distribution
      10. Computing probabilities with the Poisson distribution
      11. Computing probabilities with the t-distribution
    3. Summation
      1. Mean
      2. Variance
      3. Standard deviation
      4. Covariance
      5. Correlation
      6. Normal distribution
      7. Poisson distribution
      8. Student’s t-distribution
      9. Computing probabilities with the normal distribution
      10. Computing probabilities with the Poisson distribution
      11. Computing probabilities with the t-distribution
    4. Probability
      1. Mean
      2. Variance
      3. Standard deviation
      4. Covariance
      5. Correlation
      6. Normal distribution
      7. Poisson distribution
      8. Student’s t-distribution
      9. Computing probabilities with the normal distribution
      10. Computing probabilities with the Poisson distribution
      11. Computing probabilities with the t-distribution
    5. Descriptive statistics
      1. Mean
      2. Variance
      3. Standard deviation
      4. Covariance
      5. Correlation
      6. Normal distribution
      7. Poisson distribution
      8. Student’s t-distribution
      9. Computing probabilities with the normal distribution
      10. Computing probabilities with the Poisson distribution
      11. Computing probabilities with the t-distribution
    6. Distributions
      1. Mean
      2. Variance
      3. Standard deviation
      4. Covariance
      5. Correlation
      6. Normal distribution
      7. Poisson distribution
      8. Student’s t-distribution
      9. Computing probabilities with the normal distribution
      10. Computing probabilities with the Poisson distribution
      11. Computing probabilities with the t-distribution
    7. Sample size
      1. Mean
      2. Variance
      3. Standard deviation
      4. Covariance
      5. Correlation
      6. Normal distribution
      7. Poisson distribution
      8. Student’s t-distribution
      9. Computing probabilities with the normal distribution
      10. Computing probabilities with the Poisson distribution
      11. Computing probabilities with the t-distribution
  5. Recommended workflow
    1. Creating projects
    2. Creating scripts
    3. Creating notebooks
    4. Organizing code sections
    5. Customizing notebooks’ output
  6. Read, Manipulate, and Plot Data
    1. The datasauRus dataset in R format
    2. The Quality of Government dataset in CSV format
    3. The Quality of Government dataset in SAV (SPSS) format
    4. The Quality of Government dataset in DTA (Stata) format
    5. The Freedom House dataset in XLSX (Excel) format
  7. Linear Model with One Explanatory Variable
    1. Model specification
      1. Linear model as correlation
      2. Linear model as matrix multiplication
      3. Relation between correlation and matrix multiplication
      4. Computational note
    2. The Galton dataset
      1. Linear model as correlation
      2. Linear model as matrix multiplication
      3. Relation between correlation and matrix multiplication
      4. Computational note
    3. A word of caution about Galton’s work
      1. Linear model as correlation
      2. Linear model as matrix multiplication
      3. Relation between correlation and matrix multiplication
      4. Computational note
    4. Loading the Galton dataset
      1. Linear model as correlation
      2. Linear model as matrix multiplication
      3. Relation between correlation and matrix multiplication
      4. Computational note
    5. Estimating linear models’ coefficients
      1. Linear model as correlation
      2. Linear model as matrix multiplication
      3. Relation between correlation and matrix multiplication
      4. Computational note
    6. Logarithmic transformations
      1. Linear model as correlation
      2. Linear model as matrix multiplication
      3. Relation between correlation and matrix multiplication
      4. Computational note
    7. Plotting model results
      1. Linear model as correlation
      2. Linear model as matrix multiplication
      3. Relation between correlation and matrix multiplication
      4. Computational note
    8. Linear model does not equal straight line
      1. Linear model as correlation
      2. Linear model as matrix multiplication
      3. Relation between correlation and matrix multiplication
      4. Computational note
    9. Transforming variables
      1. Linear model as correlation
      2. Linear model as matrix multiplication
      3. Relation between correlation and matrix multiplication
      4. Computational note
    10. Regression with weights
      1. Linear model as correlation
      2. Linear model as matrix multiplication
      3. Relation between correlation and matrix multiplication
      4. Computational note
  8. Linear Model with Multiple Explanatory Variables
    1. Model specification
      1. Root Mean Squared Error and Mean Absolute Error
      2. RMSE and MAE interpretation
      3. Coefficient’s standard error
      4. Coefficient’s t-statistic
      5. Coefficient’s p-value
      6. Residual standard error
      7. Model’s multiple R-squared (or unadjusted R-squared)
      8. Model’s adjusted R-squared
      9. Model’s F-statistic
      10. Error’s normality
      11. Error’s homoscedasticity (homogeneous variance)
    2. Life expectancy, GDP and well-being in the Quality of Government dataset
      1. Root Mean Squared Error and Mean Absolute Error
      2. RMSE and MAE interpretation
      3. Coefficient’s standard error
      4. Coefficient’s t-statistic
      5. Coefficient’s p-value
      6. Residual standard error
      7. Model’s multiple R-squared (or unadjusted R-squared)
      8. Model’s adjusted R-squared
      9. Model’s F-statistic
      10. Error’s normality
      11. Error’s homoscedasticity (homogeneous variance)
    3. Estimating linear models’ coefficients
      1. Root Mean Squared Error and Mean Absolute Error
      2. RMSE and MAE interpretation
      3. Coefficient’s standard error
      4. Coefficient’s t-statistic
      5. Coefficient’s p-value
      6. Residual standard error
      7. Model’s multiple R-squared (or unadjusted R-squared)
      8. Model’s adjusted R-squared
      9. Model’s F-statistic
      10. Error’s normality
      11. Error’s homoscedasticity (homogeneous variance)
    4. Model accuracy
      1. Root Mean Squared Error and Mean Absolute Error
      2. RMSE and MAE interpretation
      3. Coefficient’s standard error
      4. Coefficient’s t-statistic
      5. Coefficient’s p-value
      6. Residual standard error
      7. Model’s multiple R-squared (or unadjusted R-squared)
      8. Model’s adjusted R-squared
      9. Model’s F-statistic
      10. Error’s normality
      11. Error’s homoscedasticity (homogeneous variance)
    5. Model summary
      1. Root Mean Squared Error and Mean Absolute Error
      2. RMSE and MAE interpretation
      3. Coefficient’s standard error
      4. Coefficient’s t-statistic
      5. Coefficient’s p-value
      6. Residual standard error
      7. Model’s multiple R-squared (or unadjusted R-squared)
      8. Model’s adjusted R-squared
      9. Model’s F-statistic
      10. Error’s normality
      11. Error’s homoscedasticity (homogeneous variance)
    6. Error’s assumptions
      1. Root Mean Squared Error and Mean Absolute Error
      2. RMSE and MAE interpretation
      3. Coefficient’s standard error
      4. Coefficient’s t-statistic
      5. Coefficient’s p-value
      6. Residual standard error
      7. Model’s multiple R-squared (or unadjusted R-squared)
      8. Model’s adjusted R-squared
      9. Model’s F-statistic
      10. Error’s normality
      11. Error’s homoscedasticity (homogeneous variance)
  9. Linear Model with Binary and Categorical Explanatory Variables
    1. Model specification with binary variables
      1. ANOVA is a particular case of a linear model with binary variables
      2. Corruption and popular vote in the Quality of Government dataset
      3. Estimating a linear model and ANOVA with one predictor and two categories
      4. Corruption and regime type in the Quality of Government dataset
      5. Estimating a linear model and ANOVA with one predictor and multiple categories
      6. Estimating a linear model with continuous and categorical predictors
      7. Corruption and interaction variables in the Quality of Government dataset
      8. Estimating a linear model with binary interactions
      9. Confidence intervals with binary interactions
      10. Estimating a linear model with categorical interactions
      11. Confidence intervals with categorical interactions
    2. Model specification with binary interactions
      1. ANOVA is a particular case of a linear model with binary variables
      2. Corruption and popular vote in the Quality of Government dataset
      3. Estimating a linear model and ANOVA with one predictor and two categories
      4. Corruption and regime type in the Quality of Government dataset
      5. Estimating a linear model and ANOVA with one predictor and multiple categories
      6. Estimating a linear model with continuous and categorical predictors
      7. Corruption and interaction variables in the Quality of Government dataset
      8. Estimating a linear model with binary interactions
      9. Confidence intervals with binary interactions
      10. Estimating a linear model with categorical interactions
      11. Confidence intervals with categorical interactions
    3. Model specification with categorical interactions
      1. ANOVA is a particular case of a linear model with binary variables
      2. Corruption and popular vote in the Quality of Government dataset
      3. Estimating a linear model and ANOVA with one predictor and two categories
      4. Corruption and regime type in the Quality of Government dataset
      5. Estimating a linear model and ANOVA with one predictor and multiple categories
      6. Estimating a linear model with continuous and categorical predictors
      7. Corruption and interaction variables in the Quality of Government dataset
      8. Estimating a linear model with binary interactions
      9. Confidence intervals with binary interactions
      10. Estimating a linear model with categorical interactions
      11. Confidence intervals with categorical interactions
  10. Linear Model with Fixed Effects
    1. Year fixed effects
      1. Model specification
      2. Corruption and popular vote in the Quality of Government dataset
      3. Estimating year fixed effects’ coefficients
      4. Estimating country-time fixed effects’ coefficients
    2. Country fixed effects
      1. Model specification
      2. Corruption and popular vote in the Quality of Government dataset
      3. Estimating year fixed effects’ coefficients
      4. Estimating country-time fixed effects’ coefficients
    3. Country-year fixed effects
      1. Model specification
      2. Corruption and popular vote in the Quality of Government dataset
      3. Estimating year fixed effects’ coefficients
      4. Estimating country-time fixed effects’ coefficients
  11. Generalized Linear Model with One Explanatory Variable
    1. Model specification
      1. Gaussian model
      2. Poisson model
      3. Quasi-Poisson model
      4. Binomial model (or logit model)
    2. Model families
      1. Gaussian model
      2. Poisson model
      3. Quasi-Poisson model
      4. Binomial model (or logit model)
  12. Generalized Linear Model with Multiple Explanatory Variables
    1. Obtaining the original codes and data
    2. Loading the original data
    3. Ordinary Least Squares
    4. Poisson Pseudo Maximum Likelihood
    5. Tobit
    6. Reporting multiple models