The Hitchhiker’s Guide to Linear Models with codes and datasets sale
About
Last week I wrote that the 2nd edition of The Hitchhiker’s Guide to Linear Models can be downloaded for free (or for a suggested price of 10 USD) from Leanpub.
The 2nd edition now also features an extra with all the datasets and codes used in the book. These are presented as RStudio projects with R scripts to enhance the hands-on book experience. This extra is sold for 20 USD (the normal price is 29 USD) and the promotion will last until June 21th, 2024.
For every exercise I did my best to connect the specific statistical concepts with R code, and every time I use linear algebra I connect it with a concrete R example. In this book you will not find something such as “this is left as an exercise to the reader”.
This books contains no proofs. I tried to replace those with multiple examples consisting in analyzing my own experiments, such as throwing a tennis ball and measuring the time it takes to hit the ground from different heights, and another where I got two thermometers and measured the temperature outside a building at the same time of the day on different days.
Table of contents
- Preface
-
R Setup
-
R and RStudio
- Windows and Mac
- Linux
-
Installing R
- Windows and Mac
- Linux
-
Installing RStudio
- Windows and Mac
- Linux
-
Installing R Packages
- Windows and Mac
- Linux
-
Changing RStudio colors and font
- Windows and Mac
- Linux
-
Installing Quarto
- Windows and Mac
- Linux
-
R and RStudio
-
Linear algebra review
- Using R as a calculator
- System of linear equations
- Matrix
- Transpose matrix
- Matrix multiplication
- Matrix representation of a system of linear equations
- Identity matrix
- Inverse matrix
- Solving systems of linear equations
-
Statistics review
-
Using R as a calculator
- Mean
- Variance
- Standard deviation
- Covariance
- Correlation
- Normal distribution
- Poisson distribution
- Student’s t-distribution
- Computing probabilities with the normal distribution
- Computing probabilities with the Poisson distribution
- Computing probabilities with the t-distribution
-
Data and dataset
- Mean
- Variance
- Standard deviation
- Covariance
- Correlation
- Normal distribution
- Poisson distribution
- Student’s t-distribution
- Computing probabilities with the normal distribution
- Computing probabilities with the Poisson distribution
- Computing probabilities with the t-distribution
-
Summation
- Mean
- Variance
- Standard deviation
- Covariance
- Correlation
- Normal distribution
- Poisson distribution
- Student’s t-distribution
- Computing probabilities with the normal distribution
- Computing probabilities with the Poisson distribution
- Computing probabilities with the t-distribution
-
Probability
- Mean
- Variance
- Standard deviation
- Covariance
- Correlation
- Normal distribution
- Poisson distribution
- Student’s t-distribution
- Computing probabilities with the normal distribution
- Computing probabilities with the Poisson distribution
- Computing probabilities with the t-distribution
-
Descriptive statistics
- Mean
- Variance
- Standard deviation
- Covariance
- Correlation
- Normal distribution
- Poisson distribution
- Student’s t-distribution
- Computing probabilities with the normal distribution
- Computing probabilities with the Poisson distribution
- Computing probabilities with the t-distribution
-
Distributions
- Mean
- Variance
- Standard deviation
- Covariance
- Correlation
- Normal distribution
- Poisson distribution
- Student’s t-distribution
- Computing probabilities with the normal distribution
- Computing probabilities with the Poisson distribution
- Computing probabilities with the t-distribution
-
Sample size
- Mean
- Variance
- Standard deviation
- Covariance
- Correlation
- Normal distribution
- Poisson distribution
- Student’s t-distribution
- Computing probabilities with the normal distribution
- Computing probabilities with the Poisson distribution
- Computing probabilities with the t-distribution
-
Using R as a calculator
-
Recommended workflow
- Creating projects
- Creating scripts
- Creating notebooks
- Organizing code sections
- Customizing notebooks’ output
-
Read, Manipulate, and Plot Data
- The datasauRus dataset in R format
- The Quality of Government dataset in CSV format
- The Quality of Government dataset in SAV (SPSS) format
- The Quality of Government dataset in DTA (Stata) format
- The Freedom House dataset in XLSX (Excel) format
-
Linear Model with One Explanatory Variable
-
Model specification
- Linear model as correlation
- Linear model as matrix multiplication
- Relation between correlation and matrix multiplication
- Computational note
-
The Galton dataset
- Linear model as correlation
- Linear model as matrix multiplication
- Relation between correlation and matrix multiplication
- Computational note
-
A word of caution about Galton’s work
- Linear model as correlation
- Linear model as matrix multiplication
- Relation between correlation and matrix multiplication
- Computational note
-
Loading the Galton dataset
- Linear model as correlation
- Linear model as matrix multiplication
- Relation between correlation and matrix multiplication
- Computational note
-
Estimating linear models’ coefficients
- Linear model as correlation
- Linear model as matrix multiplication
- Relation between correlation and matrix multiplication
- Computational note
-
Logarithmic transformations
- Linear model as correlation
- Linear model as matrix multiplication
- Relation between correlation and matrix multiplication
- Computational note
-
Plotting model results
- Linear model as correlation
- Linear model as matrix multiplication
- Relation between correlation and matrix multiplication
- Computational note
-
Linear model does not equal straight line
- Linear model as correlation
- Linear model as matrix multiplication
- Relation between correlation and matrix multiplication
- Computational note
-
Transforming variables
- Linear model as correlation
- Linear model as matrix multiplication
- Relation between correlation and matrix multiplication
- Computational note
-
Regression with weights
- Linear model as correlation
- Linear model as matrix multiplication
- Relation between correlation and matrix multiplication
- Computational note
-
Model specification
-
Linear Model with Multiple Explanatory Variables
-
Model specification
- Root Mean Squared Error and Mean Absolute Error
- RMSE and MAE interpretation
- Coefficient’s standard error
- Coefficient’s t-statistic
- Coefficient’s p-value
- Residual standard error
- Model’s multiple R-squared (or unadjusted R-squared)
- Model’s adjusted R-squared
- Model’s F-statistic
- Error’s normality
- Error’s homoscedasticity (homogeneous variance)
-
Life expectancy, GDP and well-being in the Quality of Government dataset
- Root Mean Squared Error and Mean Absolute Error
- RMSE and MAE interpretation
- Coefficient’s standard error
- Coefficient’s t-statistic
- Coefficient’s p-value
- Residual standard error
- Model’s multiple R-squared (or unadjusted R-squared)
- Model’s adjusted R-squared
- Model’s F-statistic
- Error’s normality
- Error’s homoscedasticity (homogeneous variance)
-
Estimating linear models’ coefficients
- Root Mean Squared Error and Mean Absolute Error
- RMSE and MAE interpretation
- Coefficient’s standard error
- Coefficient’s t-statistic
- Coefficient’s p-value
- Residual standard error
- Model’s multiple R-squared (or unadjusted R-squared)
- Model’s adjusted R-squared
- Model’s F-statistic
- Error’s normality
- Error’s homoscedasticity (homogeneous variance)
-
Model accuracy
- Root Mean Squared Error and Mean Absolute Error
- RMSE and MAE interpretation
- Coefficient’s standard error
- Coefficient’s t-statistic
- Coefficient’s p-value
- Residual standard error
- Model’s multiple R-squared (or unadjusted R-squared)
- Model’s adjusted R-squared
- Model’s F-statistic
- Error’s normality
- Error’s homoscedasticity (homogeneous variance)
-
Model summary
- Root Mean Squared Error and Mean Absolute Error
- RMSE and MAE interpretation
- Coefficient’s standard error
- Coefficient’s t-statistic
- Coefficient’s p-value
- Residual standard error
- Model’s multiple R-squared (or unadjusted R-squared)
- Model’s adjusted R-squared
- Model’s F-statistic
- Error’s normality
- Error’s homoscedasticity (homogeneous variance)
-
Error’s assumptions
- Root Mean Squared Error and Mean Absolute Error
- RMSE and MAE interpretation
- Coefficient’s standard error
- Coefficient’s t-statistic
- Coefficient’s p-value
- Residual standard error
- Model’s multiple R-squared (or unadjusted R-squared)
- Model’s adjusted R-squared
- Model’s F-statistic
- Error’s normality
- Error’s homoscedasticity (homogeneous variance)
-
Model specification
-
Linear Model with Binary and Categorical Explanatory Variables
-
Model specification with binary variables
- ANOVA is a particular case of a linear model with binary variables
- Corruption and popular vote in the Quality of Government dataset
- Estimating a linear model and ANOVA with one predictor and two categories
- Corruption and regime type in the Quality of Government dataset
- Estimating a linear model and ANOVA with one predictor and multiple categories
- Estimating a linear model with continuous and categorical predictors
- Corruption and interaction variables in the Quality of Government dataset
- Estimating a linear model with binary interactions
- Confidence intervals with binary interactions
- Estimating a linear model with categorical interactions
- Confidence intervals with categorical interactions
-
Model specification with binary interactions
- ANOVA is a particular case of a linear model with binary variables
- Corruption and popular vote in the Quality of Government dataset
- Estimating a linear model and ANOVA with one predictor and two categories
- Corruption and regime type in the Quality of Government dataset
- Estimating a linear model and ANOVA with one predictor and multiple categories
- Estimating a linear model with continuous and categorical predictors
- Corruption and interaction variables in the Quality of Government dataset
- Estimating a linear model with binary interactions
- Confidence intervals with binary interactions
- Estimating a linear model with categorical interactions
- Confidence intervals with categorical interactions
-
Model specification with categorical interactions
- ANOVA is a particular case of a linear model with binary variables
- Corruption and popular vote in the Quality of Government dataset
- Estimating a linear model and ANOVA with one predictor and two categories
- Corruption and regime type in the Quality of Government dataset
- Estimating a linear model and ANOVA with one predictor and multiple categories
- Estimating a linear model with continuous and categorical predictors
- Corruption and interaction variables in the Quality of Government dataset
- Estimating a linear model with binary interactions
- Confidence intervals with binary interactions
- Estimating a linear model with categorical interactions
- Confidence intervals with categorical interactions
-
Model specification with binary variables
-
Linear Model with Fixed Effects
-
Year fixed effects
- Model specification
- Corruption and popular vote in the Quality of Government dataset
- Estimating year fixed effects’ coefficients
- Estimating country-time fixed effects’ coefficients
-
Country fixed effects
- Model specification
- Corruption and popular vote in the Quality of Government dataset
- Estimating year fixed effects’ coefficients
- Estimating country-time fixed effects’ coefficients
-
Country-year fixed effects
- Model specification
- Corruption and popular vote in the Quality of Government dataset
- Estimating year fixed effects’ coefficients
- Estimating country-time fixed effects’ coefficients
-
Year fixed effects
-
Generalized Linear Model with One Explanatory Variable
-
Model specification
- Gaussian model
- Poisson model
- Quasi-Poisson model
- Binomial model (or logit model)
-
Model families
- Gaussian model
- Poisson model
- Quasi-Poisson model
- Binomial model (or logit model)
-
Model specification
-
Generalized Linear Model with Multiple Explanatory Variables
- Obtaining the original codes and data
- Loading the original data
- Ordinary Least Squares
- Poisson Pseudo Maximum Likelihood
- Tobit
- Reporting multiple models