The Hitchhiker’s Guide to Linear Models with codes and datasets sale
About
Last week I wrote that the 2nd edition of The Hitchhiker’s Guide to Linear Models can be downloaded for free (or for a suggested price of 10 USD) from Leanpub.
The 2nd edition now also features an extra with all the datasets and codes used in the book. These are presented as RStudio projects with R scripts to enhance the handson book experience. This extra is sold for 20 USD (the normal price is 29 USD) and the promotion will last until June 21th, 2024.
For every exercise I did my best to connect the specific statistical concepts with R code, and every time I use linear algebra I connect it with a concrete R example. In this book you will not find something such as “this is left as an exercise to the reader”.
This books contains no proofs. I tried to replace those with multiple examples consisting in analyzing my own experiments, such as throwing a tennis ball and measuring the time it takes to hit the ground from different heights, and another where I got two thermometers and measured the temperature outside a building at the same time of the day on different days.
Table of contents
 Preface

R Setup

R and RStudio
 Windows and Mac
 Linux

Installing R
 Windows and Mac
 Linux

Installing RStudio
 Windows and Mac
 Linux

Installing R Packages
 Windows and Mac
 Linux

Changing RStudio colors and font
 Windows and Mac
 Linux

Installing Quarto
 Windows and Mac
 Linux

R and RStudio

Linear algebra review
 Using R as a calculator
 System of linear equations
 Matrix
 Transpose matrix
 Matrix multiplication
 Matrix representation of a system of linear equations
 Identity matrix
 Inverse matrix
 Solving systems of linear equations

Statistics review

Using R as a calculator
 Mean
 Variance
 Standard deviation
 Covariance
 Correlation
 Normal distribution
 Poisson distribution
 Student’s tdistribution
 Computing probabilities with the normal distribution
 Computing probabilities with the Poisson distribution
 Computing probabilities with the tdistribution

Data and dataset
 Mean
 Variance
 Standard deviation
 Covariance
 Correlation
 Normal distribution
 Poisson distribution
 Student’s tdistribution
 Computing probabilities with the normal distribution
 Computing probabilities with the Poisson distribution
 Computing probabilities with the tdistribution

Summation
 Mean
 Variance
 Standard deviation
 Covariance
 Correlation
 Normal distribution
 Poisson distribution
 Student’s tdistribution
 Computing probabilities with the normal distribution
 Computing probabilities with the Poisson distribution
 Computing probabilities with the tdistribution

Probability
 Mean
 Variance
 Standard deviation
 Covariance
 Correlation
 Normal distribution
 Poisson distribution
 Student’s tdistribution
 Computing probabilities with the normal distribution
 Computing probabilities with the Poisson distribution
 Computing probabilities with the tdistribution

Descriptive statistics
 Mean
 Variance
 Standard deviation
 Covariance
 Correlation
 Normal distribution
 Poisson distribution
 Student’s tdistribution
 Computing probabilities with the normal distribution
 Computing probabilities with the Poisson distribution
 Computing probabilities with the tdistribution

Distributions
 Mean
 Variance
 Standard deviation
 Covariance
 Correlation
 Normal distribution
 Poisson distribution
 Student’s tdistribution
 Computing probabilities with the normal distribution
 Computing probabilities with the Poisson distribution
 Computing probabilities with the tdistribution

Sample size
 Mean
 Variance
 Standard deviation
 Covariance
 Correlation
 Normal distribution
 Poisson distribution
 Student’s tdistribution
 Computing probabilities with the normal distribution
 Computing probabilities with the Poisson distribution
 Computing probabilities with the tdistribution

Using R as a calculator

Recommended workflow
 Creating projects
 Creating scripts
 Creating notebooks
 Organizing code sections
 Customizing notebooks’ output

Read, Manipulate, and Plot Data
 The datasauRus dataset in R format
 The Quality of Government dataset in CSV format
 The Quality of Government dataset in SAV (SPSS) format
 The Quality of Government dataset in DTA (Stata) format
 The Freedom House dataset in XLSX (Excel) format

Linear Model with One Explanatory Variable

Model specification
 Linear model as correlation
 Linear model as matrix multiplication
 Relation between correlation and matrix multiplication
 Computational note

The Galton dataset
 Linear model as correlation
 Linear model as matrix multiplication
 Relation between correlation and matrix multiplication
 Computational note

A word of caution about Galton’s work
 Linear model as correlation
 Linear model as matrix multiplication
 Relation between correlation and matrix multiplication
 Computational note

Loading the Galton dataset
 Linear model as correlation
 Linear model as matrix multiplication
 Relation between correlation and matrix multiplication
 Computational note

Estimating linear models’ coefficients
 Linear model as correlation
 Linear model as matrix multiplication
 Relation between correlation and matrix multiplication
 Computational note

Logarithmic transformations
 Linear model as correlation
 Linear model as matrix multiplication
 Relation between correlation and matrix multiplication
 Computational note

Plotting model results
 Linear model as correlation
 Linear model as matrix multiplication
 Relation between correlation and matrix multiplication
 Computational note

Linear model does not equal straight line
 Linear model as correlation
 Linear model as matrix multiplication
 Relation between correlation and matrix multiplication
 Computational note

Transforming variables
 Linear model as correlation
 Linear model as matrix multiplication
 Relation between correlation and matrix multiplication
 Computational note

Regression with weights
 Linear model as correlation
 Linear model as matrix multiplication
 Relation between correlation and matrix multiplication
 Computational note

Model specification

Linear Model with Multiple Explanatory Variables

Model specification
 Root Mean Squared Error and Mean Absolute Error
 RMSE and MAE interpretation
 Coefficient’s standard error
 Coefficient’s tstatistic
 Coefficient’s pvalue
 Residual standard error
 Model’s multiple Rsquared (or unadjusted Rsquared)
 Model’s adjusted Rsquared
 Model’s Fstatistic
 Error’s normality
 Error’s homoscedasticity (homogeneous variance)

Life expectancy, GDP and wellbeing in the Quality of Government dataset
 Root Mean Squared Error and Mean Absolute Error
 RMSE and MAE interpretation
 Coefficient’s standard error
 Coefficient’s tstatistic
 Coefficient’s pvalue
 Residual standard error
 Model’s multiple Rsquared (or unadjusted Rsquared)
 Model’s adjusted Rsquared
 Model’s Fstatistic
 Error’s normality
 Error’s homoscedasticity (homogeneous variance)

Estimating linear models’ coefficients
 Root Mean Squared Error and Mean Absolute Error
 RMSE and MAE interpretation
 Coefficient’s standard error
 Coefficient’s tstatistic
 Coefficient’s pvalue
 Residual standard error
 Model’s multiple Rsquared (or unadjusted Rsquared)
 Model’s adjusted Rsquared
 Model’s Fstatistic
 Error’s normality
 Error’s homoscedasticity (homogeneous variance)

Model accuracy
 Root Mean Squared Error and Mean Absolute Error
 RMSE and MAE interpretation
 Coefficient’s standard error
 Coefficient’s tstatistic
 Coefficient’s pvalue
 Residual standard error
 Model’s multiple Rsquared (or unadjusted Rsquared)
 Model’s adjusted Rsquared
 Model’s Fstatistic
 Error’s normality
 Error’s homoscedasticity (homogeneous variance)

Model summary
 Root Mean Squared Error and Mean Absolute Error
 RMSE and MAE interpretation
 Coefficient’s standard error
 Coefficient’s tstatistic
 Coefficient’s pvalue
 Residual standard error
 Model’s multiple Rsquared (or unadjusted Rsquared)
 Model’s adjusted Rsquared
 Model’s Fstatistic
 Error’s normality
 Error’s homoscedasticity (homogeneous variance)

Error’s assumptions
 Root Mean Squared Error and Mean Absolute Error
 RMSE and MAE interpretation
 Coefficient’s standard error
 Coefficient’s tstatistic
 Coefficient’s pvalue
 Residual standard error
 Model’s multiple Rsquared (or unadjusted Rsquared)
 Model’s adjusted Rsquared
 Model’s Fstatistic
 Error’s normality
 Error’s homoscedasticity (homogeneous variance)

Model specification

Linear Model with Binary and Categorical Explanatory Variables

Model specification with binary variables
 ANOVA is a particular case of a linear model with binary variables
 Corruption and popular vote in the Quality of Government dataset
 Estimating a linear model and ANOVA with one predictor and two categories
 Corruption and regime type in the Quality of Government dataset
 Estimating a linear model and ANOVA with one predictor and multiple categories
 Estimating a linear model with continuous and categorical predictors
 Corruption and interaction variables in the Quality of Government dataset
 Estimating a linear model with binary interactions
 Confidence intervals with binary interactions
 Estimating a linear model with categorical interactions
 Confidence intervals with categorical interactions

Model specification with binary interactions
 ANOVA is a particular case of a linear model with binary variables
 Corruption and popular vote in the Quality of Government dataset
 Estimating a linear model and ANOVA with one predictor and two categories
 Corruption and regime type in the Quality of Government dataset
 Estimating a linear model and ANOVA with one predictor and multiple categories
 Estimating a linear model with continuous and categorical predictors
 Corruption and interaction variables in the Quality of Government dataset
 Estimating a linear model with binary interactions
 Confidence intervals with binary interactions
 Estimating a linear model with categorical interactions
 Confidence intervals with categorical interactions

Model specification with categorical interactions
 ANOVA is a particular case of a linear model with binary variables
 Corruption and popular vote in the Quality of Government dataset
 Estimating a linear model and ANOVA with one predictor and two categories
 Corruption and regime type in the Quality of Government dataset
 Estimating a linear model and ANOVA with one predictor and multiple categories
 Estimating a linear model with continuous and categorical predictors
 Corruption and interaction variables in the Quality of Government dataset
 Estimating a linear model with binary interactions
 Confidence intervals with binary interactions
 Estimating a linear model with categorical interactions
 Confidence intervals with categorical interactions

Model specification with binary variables

Linear Model with Fixed Effects

Year fixed effects
 Model specification
 Corruption and popular vote in the Quality of Government dataset
 Estimating year fixed effects’ coefficients
 Estimating countrytime fixed effects’ coefficients

Country fixed effects
 Model specification
 Corruption and popular vote in the Quality of Government dataset
 Estimating year fixed effects’ coefficients
 Estimating countrytime fixed effects’ coefficients

Countryyear fixed effects
 Model specification
 Corruption and popular vote in the Quality of Government dataset
 Estimating year fixed effects’ coefficients
 Estimating countrytime fixed effects’ coefficients

Year fixed effects

Generalized Linear Model with One Explanatory Variable

Model specification
 Gaussian model
 Poisson model
 QuasiPoisson model
 Binomial model (or logit model)

Model families
 Gaussian model
 Poisson model
 QuasiPoisson model
 Binomial model (or logit model)

Model specification

Generalized Linear Model with Multiple Explanatory Variables
 Obtaining the original codes and data
 Loading the original data
 Ordinary Least Squares
 Poisson Pseudo Maximum Likelihood
 Tobit
 Reporting multiple models