The Hitchhiker’s Guide to Linear Models
R
Linear models
Some minor corrections.
About
The Hitchhiker’s Guide to Linear Models can be downloaded for free (or for a suggested price of 10 USD) from Leanpub.
For every exercise I did my best to connect the specific statistical concepts with R code, and every time I use linear algebra I connect it with a concrete R example. In this book you will not find something such as “this is left as an exercise to the reader”.
This books contains no proofs. I tried to replace those with multiple examples consisting in analyzing my own experiments, such as throwing a tennis ball and measuring the time it takes to hit the ground from different heights, and another where I got two thermometers and measured the temperature outside a building at the same time of the day on different days.
Table of contents
- Preface
-
R Setup
-
R and RStudio
- Windows and Mac
- Linux
-
Installing R
- Windows and Mac
- Linux
-
Installing RStudio
- Windows and Mac
- Linux
-
Installing R Packages
- Windows and Mac
- Linux
-
Changing RStudio colors and font
- Windows and Mac
- Linux
-
Installing Quarto
- Windows and Mac
- Linux
-
R and RStudio
-
Linear algebra review
- Using R as a calculator
- System of linear equations
- Matrix
- Transpose matrix
- Matrix multiplication
- Matrix representation of a system of linear equations
- Identity matrix
- Inverse matrix
- Solving systems of linear equations
-
Statistics review
-
Using R as a calculator
- Mean
- Variance
- Standard deviation
- Covariance
- Correlation
- Normal distribution
- Poisson distribution
- Student’s t-distribution
- Computing probabilities with the normal distribution
- Computing probabilities with the Poisson distribution
- Computing probabilities with the t-distribution
-
Data and dataset
- Mean
- Variance
- Standard deviation
- Covariance
- Correlation
- Normal distribution
- Poisson distribution
- Student’s t-distribution
- Computing probabilities with the normal distribution
- Computing probabilities with the Poisson distribution
- Computing probabilities with the t-distribution
-
Summation
- Mean
- Variance
- Standard deviation
- Covariance
- Correlation
- Normal distribution
- Poisson distribution
- Student’s t-distribution
- Computing probabilities with the normal distribution
- Computing probabilities with the Poisson distribution
- Computing probabilities with the t-distribution
-
Probability
- Mean
- Variance
- Standard deviation
- Covariance
- Correlation
- Normal distribution
- Poisson distribution
- Student’s t-distribution
- Computing probabilities with the normal distribution
- Computing probabilities with the Poisson distribution
- Computing probabilities with the t-distribution
-
Descriptive statistics
- Mean
- Variance
- Standard deviation
- Covariance
- Correlation
- Normal distribution
- Poisson distribution
- Student’s t-distribution
- Computing probabilities with the normal distribution
- Computing probabilities with the Poisson distribution
- Computing probabilities with the t-distribution
-
Distributions
- Mean
- Variance
- Standard deviation
- Covariance
- Correlation
- Normal distribution
- Poisson distribution
- Student’s t-distribution
- Computing probabilities with the normal distribution
- Computing probabilities with the Poisson distribution
- Computing probabilities with the t-distribution
-
Sample size
- Mean
- Variance
- Standard deviation
- Covariance
- Correlation
- Normal distribution
- Poisson distribution
- Student’s t-distribution
- Computing probabilities with the normal distribution
- Computing probabilities with the Poisson distribution
- Computing probabilities with the t-distribution
-
Using R as a calculator
-
Recommended workflow
- Creating projects
- Creating scripts
- Creating notebooks
- Organizing code sections
- Customizing notebooks’ output
-
Read, Manipulate, and Plot Data
- The datasauRus dataset in R format
- The Quality of Government dataset in CSV format
- The Quality of Government dataset in SAV (SPSS) format
- The Quality of Government dataset in DTA (Stata) format
- The Freedom House dataset in XLSX (Excel) format
-
Linear Model with One Explanatory Variable
-
Model specification
- Linear model as correlation
- Linear model as matrix multiplication
- Relation between correlation and matrix multiplication
- Computational note
-
The Galton dataset
- Linear model as correlation
- Linear model as matrix multiplication
- Relation between correlation and matrix multiplication
- Computational note
-
A word of caution about Galton’s work
- Linear model as correlation
- Linear model as matrix multiplication
- Relation between correlation and matrix multiplication
- Computational note
-
Loading the Galton dataset
- Linear model as correlation
- Linear model as matrix multiplication
- Relation between correlation and matrix multiplication
- Computational note
-
Estimating linear models’ coefficients
- Linear model as correlation
- Linear model as matrix multiplication
- Relation between correlation and matrix multiplication
- Computational note
-
Logarithmic transformations
- Linear model as correlation
- Linear model as matrix multiplication
- Relation between correlation and matrix multiplication
- Computational note
-
Plotting model results
- Linear model as correlation
- Linear model as matrix multiplication
- Relation between correlation and matrix multiplication
- Computational note
-
Linear model does not equal straight line
- Linear model as correlation
- Linear model as matrix multiplication
- Relation between correlation and matrix multiplication
- Computational note
-
Transforming variables
- Linear model as correlation
- Linear model as matrix multiplication
- Relation between correlation and matrix multiplication
- Computational note
-
Regression with weights
- Linear model as correlation
- Linear model as matrix multiplication
- Relation between correlation and matrix multiplication
- Computational note
-
Model specification
-
Linear Model with Multiple Explanatory Variables
-
Model specification
- Root Mean Squared Error and Mean Absolute Error
- RMSE and MAE interpretation
- Coefficient’s standard error
- Coefficient’s t-statistic
- Coefficient’s p-value
- Residual standard error
- Model’s multiple R-squared (or unadjusted R-squared)
- Model’s adjusted R-squared
- Model’s F-statistic
- Error’s normality
- Error’s homoscedasticity (homogeneous variance)
-
Life expectancy, GDP and well-being in the Quality of Government dataset
- Root Mean Squared Error and Mean Absolute Error
- RMSE and MAE interpretation
- Coefficient’s standard error
- Coefficient’s t-statistic
- Coefficient’s p-value
- Residual standard error
- Model’s multiple R-squared (or unadjusted R-squared)
- Model’s adjusted R-squared
- Model’s F-statistic
- Error’s normality
- Error’s homoscedasticity (homogeneous variance)
-
Estimating linear models’ coefficients
- Root Mean Squared Error and Mean Absolute Error
- RMSE and MAE interpretation
- Coefficient’s standard error
- Coefficient’s t-statistic
- Coefficient’s p-value
- Residual standard error
- Model’s multiple R-squared (or unadjusted R-squared)
- Model’s adjusted R-squared
- Model’s F-statistic
- Error’s normality
- Error’s homoscedasticity (homogeneous variance)
-
Model accuracy
- Root Mean Squared Error and Mean Absolute Error
- RMSE and MAE interpretation
- Coefficient’s standard error
- Coefficient’s t-statistic
- Coefficient’s p-value
- Residual standard error
- Model’s multiple R-squared (or unadjusted R-squared)
- Model’s adjusted R-squared
- Model’s F-statistic
- Error’s normality
- Error’s homoscedasticity (homogeneous variance)
-
Model summary
- Root Mean Squared Error and Mean Absolute Error
- RMSE and MAE interpretation
- Coefficient’s standard error
- Coefficient’s t-statistic
- Coefficient’s p-value
- Residual standard error
- Model’s multiple R-squared (or unadjusted R-squared)
- Model’s adjusted R-squared
- Model’s F-statistic
- Error’s normality
- Error’s homoscedasticity (homogeneous variance)
-
Error’s assumptions
- Root Mean Squared Error and Mean Absolute Error
- RMSE and MAE interpretation
- Coefficient’s standard error
- Coefficient’s t-statistic
- Coefficient’s p-value
- Residual standard error
- Model’s multiple R-squared (or unadjusted R-squared)
- Model’s adjusted R-squared
- Model’s F-statistic
- Error’s normality
- Error’s homoscedasticity (homogeneous variance)
-
Model specification
-
Linear Model with Binary and Categorical Explanatory Variables
-
Model specification with binary variables
- ANOVA is a particular case of a linear model with binary variables
- Corruption and popular vote in the Quality of Government dataset
- Estimating a linear model and ANOVA with one predictor and two categories
- Corruption and regime type in the Quality of Government dataset
- Estimating a linear model and ANOVA with one predictor and multiple categories
- Estimating a linear model with continuous and categorical predictors
- Corruption and interaction variables in the Quality of Government dataset
- Estimating a linear model with binary interactions
- Confidence intervals with binary interactions
- Estimating a linear model with categorical interactions
- Confidence intervals with categorical interactions
-
Model specification with binary interactions
- ANOVA is a particular case of a linear model with binary variables
- Corruption and popular vote in the Quality of Government dataset
- Estimating a linear model and ANOVA with one predictor and two categories
- Corruption and regime type in the Quality of Government dataset
- Estimating a linear model and ANOVA with one predictor and multiple categories
- Estimating a linear model with continuous and categorical predictors
- Corruption and interaction variables in the Quality of Government dataset
- Estimating a linear model with binary interactions
- Confidence intervals with binary interactions
- Estimating a linear model with categorical interactions
- Confidence intervals with categorical interactions
-
Model specification with categorical interactions
- ANOVA is a particular case of a linear model with binary variables
- Corruption and popular vote in the Quality of Government dataset
- Estimating a linear model and ANOVA with one predictor and two categories
- Corruption and regime type in the Quality of Government dataset
- Estimating a linear model and ANOVA with one predictor and multiple categories
- Estimating a linear model with continuous and categorical predictors
- Corruption and interaction variables in the Quality of Government dataset
- Estimating a linear model with binary interactions
- Confidence intervals with binary interactions
- Estimating a linear model with categorical interactions
- Confidence intervals with categorical interactions
-
Model specification with binary variables
-
Linear Model with Fixed Effects
-
Year fixed effects
- Model specification
- Corruption and popular vote in the Quality of Government dataset
- Estimating year fixed effects’ coefficients
- Estimating country-time fixed effects’ coefficients
-
Country fixed effects
- Model specification
- Corruption and popular vote in the Quality of Government dataset
- Estimating year fixed effects’ coefficients
- Estimating country-time fixed effects’ coefficients
-
Country-year fixed effects
- Model specification
- Corruption and popular vote in the Quality of Government dataset
- Estimating year fixed effects’ coefficients
- Estimating country-time fixed effects’ coefficients
-
Year fixed effects
-
Generalized Linear Model with One Explanatory Variable
-
Model specification
- Gaussian model
- Poisson model
- Quasi-Poisson model
- Binomial model (or logit model)
-
Model families
- Gaussian model
- Poisson model
- Quasi-Poisson model
- Binomial model (or logit model)
-
Model specification
-
Generalized Linear Model with Multiple Explanatory Variables
- Obtaining the original codes and data
- Loading the original data
- Ordinary Least Squares
- Poisson Pseudo Maximum Likelihood
- Tobit
- Reporting multiple models