The Hitchhiker’s Guide to Linear Models is now complete

My new book.
Author

Mauricio “Pachá” Vargas S.

Published

September 8, 2023

The book can be downloaded for free but you will need a Leanpub account, same if you buy it.

The Hitchhiker’s Guide to Linear Models is finally complete. It took me a while to finish it but I’m happy with the result. I hope you enjoy it as much as I did writing it.

The Github repository contains the code for the book so that the readers can avoid copy and paste from the PDF.

Table of contents:

Contents


Preface i


1 R Setup 1
1.1 R and Rstudio 1
1.2 Installing R 1
1.2.1 Windows and Mac 1
1.2.2 Linux 1
1.3 Installing RStudio 2
1.3.1 Windows and Mac 2
1.3.2 Linux 2
1.4 Installing R Packages. 2
1.5 Changing RStudio colors and font 4
1.6 Installing Quarto 4
1.6.1 Windows and Mac 4
1.6.2 Linux1 4


2 Linear algebra review 5
2.1 Using R as a calculator 5
2.2 System of linear equations 5
2.3 Matrix 5
2.4 Transpose matrix 6
2.5 Matrix multiplication 6
2.6 Matrix representation of a system of linear equations 6
2.7 Identity matrix 7
2.8 Inverse matrix 7
2.9 Solving systems of linear equations 7


3 Statistics review
3.1 Using R as a calculator 11
3.2 Data and dataset 11
3.3 Summation 11
3.4 Probability 11
3.5 Descriptive statistics 13
3.5.1 Mean 13
3.5.2 Variance 13
3.5.3 Standard deviation 14
3.5.4 Covariance 15
3.5.5 Correlation 16
3.6 Distributions 20
3.6.1 Normal distribution 20
3.6.2 Poisson distribution 22
3.6.3 Student’s t-distribution 23
3.6.4 Computing probabilities with the normal distribution 24
3.6.5 Computing probabilities with the Poisson distribution 27
3.6.6 Computing probabilities with the t-distribution 28
3.7 Sample size 29


4 Recommended workflow 30
4.1 Creating projects 30
4.2 Creating scripts 30
4.3 Creating notebooks 32
4.4 Organizing code sections 33
4.5 Customizing notebooks’ output 34


5 Read, Manipulate, and Plot Data 35
5.1 The datasauRus dataset in R format. 35
5.2 The Quality of Government dataset in CSV format. 40
5.3 The Quality of Government dataset in SAV (SPSS) format 44
5.4 The Quality of Government dataset in DTA (Stata) format 48
5.5 The Freedom House dataset in XLSX (Excel) format 50


6 Linear Model with One Explanatory Variable 60
6.1 Model specification 60
6.2 The Galton dataset 64
6.3 A word of caution about Galton’s work 64
6.4 Loading the Galton dataset 65
6.5 Estimating linear models’ coefficients 66
6.5.1 Linear model as correlation 66
6.5.2 Linear model as matrix multiplication 67
6.5.3 Relation between correlation and matrix multiplication 71
6.5.4 Computational note 75
6.6 Logarithmic transformations 75
6.7 Plotting model results 76
6.8 Linear model does not equal straight line 81
6.9 Transforming variables 85
6.10 Regression with weights 89


7 Linear Model with Multiple Explanatory Variables 91
7.1 Model specification 91
7.2 Life expectancy, GDP and well-being in the Quality of Government dataset 94
7.3 Estimating linear models’ coefficients 96
7.4 Model accuracy 103
7.4.1 Root Mean Squared Error and Mean Absolute Error 103
7.4.2 RMSE and MAE interpretation 104
7.5 Model summary 107
7.5.1 Coefficient’s standard error 107
7.5.2 Coefficient’s t-statistic 108
7.5.3 Coefficient’s p-value 108
7.5.4 Residual standard error 109
7.5.5 Model’s multiple R-squared (or unadjusted R-squared) 109
7.5.6 Model’s adjusted R-squared 110
7.5.7 Model’s F-statistic 111
7.6 Error’s assumptions 111
7.6.1 Error’s normality 112
7.6.2 Error’s homoscedasticity (homogeneous variance) 113


8 Linear Model with Binary and Categorical Explanatory Variables 114
8.1 Model specification with binary variables 114
8.1.1 ANOVA is a particular case of a linear model with binary variables 114
8.1.2 Corruption and popular vote in the Quality of Government dataset 114
8.1.3 Estimating a linear model and ANOVA with one predictor and two categories 116
8.1.4 Corruption and regime type in the Quality of Government dataset 118
8.1.5 Estimating a linear model and ANOVA with one predictor and multiple categories 120
8.1.6 Estimating a linear model with continuous and categorical predictors 126
8.2 Model specification with binary interactions 128
8.2.1 Corruption and interaction variables in the Quality of Government dataset 128
8.2.2 Estimating a linear model with binary interactions 131
8.2.3 Confidence intervals with binary interactions 133
8.3 Model specification with categorical interactions 136
8.3.1 Estimating a linear model with categorical interactions 136
8.3.2 Confidence intervals with categorical interactions 137


9 Linear Model with Fixed Effects 140
9.1 Year fixed effects 140
9.1.1 Model specification 140
9.1.2 Corruption and popular vote in the Quality of Government dataset 140
9.1.3 Estimating year fixed effects’ coefficients 142
9.2 Country fixed effects 145
9.2.1 Model specification 145
9.2.2 Corruption and popular vote in the Quality of Government dataset 145
9.2.3 Estimating country-time fixed effects’ coefficients 145
9.3 Country-year fixed effects 148
9.3.1 Model specification 148
9.3.2 Corruption and popular vote in the Quality of Government dataset 149
9.3.3 Estimating country-time fixed effects’ coefficients 149


10 Generalized Linear Model with One Explanatory Variable 152
10.1 Model specification 152
10.2 Model families. 152
10.2.1 Gaussian model 153
10.2.2 Poisson model 153
10.2.3 Quasi-Poisson model 154
10.2.4 Binomial model (or logit model) 157


11 Generalized Linear Model with Multiple Explanatory Variables 165
11.1 Obtaining the original codes and data 165
11.2 Loading the original data 165
11.3 Ordinary Least Squares 166
11.4 Poisson Pseudo Maximum Likelihood 167
11.5 Tobit 169
11.6 Reporting multiple models 170


References 172

Don’t panic!