# Needed packages to perform the analysis
<- function(pkg){
load <- pkg[!(pkg %in% installed.packages()[, "Package"])]
new.pkg if (length(new.pkg))
install.packages(new.pkg, dependencies = TRUE)
sapply(pkg, require, character.only = TRUE)
}
<- c("lavaan","semPlot","semTools","nonnest2","htmlTable")
packages load(packages)
Structural Equation Modelling in R (Part 1)
Updated 2022-05-28: I moved the blog to Quarto, so I had to update the paths.
Brief explanation
Structural Equation Modelling (SEM) is a state of art methodology and fulfills much of broader discusion about statistical modeling, and allows to make inference and causal analysis. SEM models are regression models broadly used in Marketing, Human Resources, Biostatistics and Medicine, revealing their flexibility as analytical tool.
Despite being a state-of-the-art methodology, SEM does not create new developments of statistical theory, since it consists in a combination of multivariate analysis, factor analysis and regression analysis. SEM makes it possible to validate or reject one or more hypothesis about an existing relation between different variables, to estimate simultaneously dependency relations between variables and estimate measurement error in model’s variables. SEM may be understood as a net of unidirectional or bidirectional paths linking different variables. Such net can be validated using multivariate data.
Confirmatory Factor Analysis
A usual methodology for model evaluation is Confirmatory Factor Analysis (CFA) that is a particular case of SEM. It is a process which consists in specifying quantity and kinds of observed variables to one or more latent variables and analyze how well those variables measure the latent variable itself. Think of a latent variable as an artificial variable that is represented as a linear combination of observed variables.
Holzinger and Swineford (1939) proposed one the most famous CFA models. In their work they analysed the mental ability based test scores of children from two different schools (details here)
We can represent Holzinger and Swineford’s model in a diagram:
In this model,
A factor loading is a measurable effect which reflects the incidence of an observed variable over another either observed or latent variable. Interpreting factor loadings is equivalent to interpreting direct effects in a linear econometric model. In this example factor loadings are well defined as there are no missing relations between variables, is known which observed variables have an effect over defined latent variables, and both latent variables covariate so there is a relation between latent variables.
Model evaluation
When I wrote my thesis I had to study SEM’s goodness of fit and its indicators. The interested reader may visit my Github repo to read more about some important linear algebra aspects of SEM but here I present a table that synthesizes 50% of my thesis:
Indicator | Type | Classification | Direction | Ideal values | Normed (0-1) |
---|---|---|---|---|---|
RMSEA | significance | adjusted | less is better | no | |
SMSR | significance | unadjusted | less is better | no | |
GFI | significance | unadjusted | more is better | yes |
This table is useful for a cheatsheet and to keep in mind what to look when comparing models.
Using Lavaan to obtain parameters estimates
I highly recommend using lavaan. I have used AMOS and Stata but I guess R’s flexibility is unbeatable at the moment. Lavaan also provides HolzingerSwineford1939
dataset that contains the complete recordings made by Holzinger and Swineford.
First I load the needed packages to perform the analysis:
Then I fit the model using the cfa()
function:
# 1.2. Specify the model
<- ' visual =~ x1 + x2 + x3
HS.model textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
# 1.3. Fit the model
<- cfa(HS.model, data = HolzingerSwineford1939) fit
Then I obtain a model summary, confidence intervals and goodness of fit indicators:
# 2.1. Display summary output
summary(fit, fit.measures = FALSE)
lavaan 0.6.15 ended normally after 35 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 21
Number of observations 301
Model Test User Model:
Test statistic 85.306
Degrees of freedom 24
P-value (Chi-square) 0.000
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|)
visual =~
x1 1.000
x2 0.554 0.100 5.554 0.000
x3 0.729 0.109 6.685 0.000
textual =~
x4 1.000
x5 1.113 0.065 17.014 0.000
x6 0.926 0.055 16.703 0.000
speed =~
x7 1.000
x8 1.180 0.165 7.152 0.000
x9 1.082 0.151 7.155 0.000
Covariances:
Estimate Std.Err z-value P(>|z|)
visual ~~
textual 0.408 0.074 5.552 0.000
speed 0.262 0.056 4.660 0.000
textual ~~
speed 0.173 0.049 3.518 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
.x1 0.549 0.114 4.833 0.000
.x2 1.134 0.102 11.146 0.000
.x3 0.844 0.091 9.317 0.000
.x4 0.371 0.048 7.779 0.000
.x5 0.446 0.058 7.642 0.000
.x6 0.356 0.043 8.277 0.000
.x7 0.799 0.081 9.823 0.000
.x8 0.488 0.074 6.573 0.000
.x9 0.566 0.071 8.003 0.000
visual 0.809 0.145 5.564 0.000
textual 0.979 0.112 8.737 0.000
speed 0.384 0.086 4.451 0.000
# 2.2. Obtain confidence intervals for the estimated coefficients
parameterEstimates(fit, ci = TRUE, level = 0.95)
lhs op rhs est se z pvalue ci.lower ci.upper
1 visual =~ x1 1.000 0.000 NA NA 1.000 1.000
2 visual =~ x2 0.554 0.100 5.554 0 0.358 0.749
3 visual =~ x3 0.729 0.109 6.685 0 0.516 0.943
4 textual =~ x4 1.000 0.000 NA NA 1.000 1.000
5 textual =~ x5 1.113 0.065 17.014 0 0.985 1.241
6 textual =~ x6 0.926 0.055 16.703 0 0.817 1.035
7 speed =~ x7 1.000 0.000 NA NA 1.000 1.000
8 speed =~ x8 1.180 0.165 7.152 0 0.857 1.503
9 speed =~ x9 1.082 0.151 7.155 0 0.785 1.378
10 x1 ~~ x1 0.549 0.114 4.833 0 0.326 0.772
11 x2 ~~ x2 1.134 0.102 11.146 0 0.934 1.333
12 x3 ~~ x3 0.844 0.091 9.317 0 0.667 1.022
13 x4 ~~ x4 0.371 0.048 7.779 0 0.278 0.465
14 x5 ~~ x5 0.446 0.058 7.642 0 0.332 0.561
15 x6 ~~ x6 0.356 0.043 8.277 0 0.272 0.441
16 x7 ~~ x7 0.799 0.081 9.823 0 0.640 0.959
17 x8 ~~ x8 0.488 0.074 6.573 0 0.342 0.633
18 x9 ~~ x9 0.566 0.071 8.003 0 0.427 0.705
19 visual ~~ visual 0.809 0.145 5.564 0 0.524 1.094
20 textual ~~ textual 0.979 0.112 8.737 0 0.760 1.199
21 speed ~~ speed 0.384 0.086 4.451 0 0.215 0.553
22 visual ~~ textual 0.408 0.074 5.552 0 0.264 0.552
23 visual ~~ speed 0.262 0.056 4.660 0 0.152 0.373
24 textual ~~ speed 0.173 0.049 3.518 0 0.077 0.270
# 2.3 Obtain goodness of fit indicators of the model
fitMeasures(fit, c("chisq", "rmsea", "srmr", "gfi", "ecvi"))
chisq rmsea srmr gfi ecvi
85.306 0.092 0.065 0.943 0.423
reliability(fit)
visual textual speed
alpha 0.6261171 0.8827069 0.6884550
omega 0.6253180 0.8851754 0.6877600
omega2 0.6253180 0.8851754 0.6877600
omega3 0.6120052 0.8850608 0.6858417
avevar 0.3705589 0.7210163 0.4244883
Exporting lavaan output to tables
Lavaan produces a lot of potential pieces of output, but I suggest that at minimum that you export the above model summary, confidence intervals and goodness of fit indicators as part of your final tables:
htmlTable(txtRound(parameterEstimates(fit, ci = TRUE, level = 0.95), digits = 3, excl.cols = c(1,3)), align = "l|r|r|r|r|r|r|r|r|r")
lhs | op | rhs | est | se | z | pvalue | ci.lower | ci.upper | excl.cols1 | excl.cols2 | |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | visual | =~ | x1 | 1 | 0 | 1 | 1 | visual | 1.000 | ||
2 | visual | =~ | x2 | 0.553500293763662 | 0.0996651187683112 | 5.55360090474952 | 2.79844001305207e-08 | 0.358160250462865 | 0.748840337064459 | visual | 2.000 |
3 | visual | =~ | x3 | 0.729370209767044 | 0.109109703120018 | 6.68474195154533 | 2.31332730749045e-11 | 0.515519121287951 | 0.943221298246138 | visual | 3.000 |
4 | textual | =~ | x4 | 1 | 0 | 1 | 1 | textual | 4.000 | ||
5 | textual | =~ | x5 | 1.11307657811851 | 0.0654201086095661 | 17.0142881413032 | 0 | 0.984855521379059 | 1.24129763485796 | textual | 5.000 |
6 | textual | =~ | x6 | 0.926146236632328 | 0.0554488562419873 | 16.7027112802919 | 0 | 0.817468475414094 | 1.03482399785056 | textual | 6.000 |
7 | speed | =~ | x7 | 1 | 0 | 1 | 1 | speed | 7.000 | ||
8 | speed | =~ | x8 | 1.17995083947085 | 0.164986572020982 | 7.15179923442977 | 8.56426041195846e-13 | 0.856583100377002 | 1.5033185785647 | speed | 8.000 |
9 | speed | =~ | x9 | 1.08153015685908 | 0.151167438085725 | 7.15451800040272 | 8.39772695826468e-13 | 0.785247422575875 | 1.37781289114229 | speed | 9.000 |
10 | x1 | ~~ | x1 | 0.549053973199019 | 0.11360092326415 | 4.83318231421707 | 1.34367627646625e-06 | 0.326400254990787 | 0.77170769140725 | 1.000 | 1.000 |
11 | x2 | ~~ | x2 | 1.13383901648387 | 0.101723370956 | 11.146298100702 | 0 | 0.934464873024101 | 1.33321315994364 | 2.000 | 2.000 |
12 | x3 | ~~ | x3 | 0.844324045708499 | 0.0906231835368078 | 9.31686586981978 | 0 | 0.666705869811992 | 1.02194222160501 | 3.000 | 3.000 |
13 | x4 | ~~ | x4 | 0.371172992836844 | 0.0477177909891553 | 7.77850326141877 | 7.32747196252603e-15 | 0.277647841076289 | 0.464698144597398 | 4.000 | 4.000 |
14 | x5 | ~~ | x5 | 0.446255070260067 | 0.0583927754760264 | 7.64229935333834 | 2.1316282072803e-14 | 0.331807333369722 | 0.560702807150413 | 5.000 | 5.000 |
15 | x6 | ~~ | x6 | 0.356202664278948 | 0.0430349678663535 | 8.27705194029997 | 2.22044604925031e-16 | 0.271855677185057 | 0.440549651372839 | 6.000 | 6.000 |
16 | x7 | ~~ | x7 | 0.799391638325101 | 0.0813815590555713 | 9.82276141673862 | 0 | 0.639886713570462 | 0.95889656307974 | 7.000 | 7.000 |
17 | x8 | ~~ | x8 | 0.487697083769163 | 0.0741940912520304 | 6.57326042464085 | 4.92252905104351e-11 | 0.342279337049506 | 0.633114830488821 | 8.000 | 8.000 |
18 | x9 | ~~ | x9 | 0.566131293573695 | 0.0707369371033875 | 8.00333343167304 | 1.11022302462516e-15 | 0.42748944447438 | 0.704773142673009 | 9.000 | 9.000 |
19 | visual | ~~ | visual | 0.809315981090822 | 0.145462407414288 | 5.56374664407847 | 2.64043220621346e-08 | 0.524214901454325 | 1.09441706072732 | visual | visual |
20 | textual | ~~ | textual | 0.97949137285969 | 0.112105849301359 | 8.73720130540782 | 0 | 0.759767945772751 | 1.19921479994663 | textual | textual |
21 | speed | ~~ | speed | 0.383747649264106 | 0.0862091944288757 | 4.45135407895159 | 8.53305080417144e-06 | 0.214780733047299 | 0.552714565480913 | speed | speed |
22 | visual | ~~ | textual | 0.408232442095546 | 0.0735238821959155 | 5.5523787632398 | 2.81808156810115e-08 | 0.264128280987986 | 0.552336603203106 | visual | textual |
23 | visual | ~~ | speed | 0.26222460204864 | 0.0562763994994703 | 4.65958384653069 | 3.16849313275469e-06 | 0.15192488585009 | 0.372524318247189 | visual | speed |
24 | textual | ~~ | speed | 0.173494684580679 | 0.0493146604470061 | 3.51811576938906 | 0.000434622709184929 | 0.0768397261947255 | 0.270149642966633 | textual | speed |
htmlTable(txtRound(reliability(fit),3), align = "l|r|r|r|r")
visual | textual | speed | |
---|---|---|---|
alpha | 0.626 | 0.883 | 0.688 |
omega | 0.625 | 0.885 | 0.688 |
omega2 | 0.625 | 0.885 | 0.688 |
omega3 | 0.612 | 0.885 | 0.686 |
avevar | 0.371 | 0.721 | 0.424 |
There are other packages to convert R output to tables such as knitr
and xtables
that can also export to
Creating a diagram for the model in R
Now let’s get on to producing the exciting part of an SEM analysis - the model diagram. This code might not be good-looking but the result is!
semPaths(fit, style = "lisrel",
whatLabels = "std", edge.label.cex = .6, node.label.cex = .6,
label.prop = 0.9, edge.label.color = "black", rotation = 4,
equalizeManifests = FALSE, optimizeLatRes = TRUE, node.width = 1.5,
edge.width = 0.5, shapeMan = "rectangle", shapeLat = "ellipse",
shapeInt = "triangle", sizeMan = 4, sizeInt = 2, sizeLat = 4,
curve = 2, unCol = "#070b8c")
If you are running a script instead of a R Markdown document, you can save the plot as svg with these lines:
svg("img/cfa_example_2.svg")
semPaths(fit, style = "lisrel",
whatLabels = "std", edge.label.cex = .6, node.label.cex = .6,
label.prop = 0.9, edge.label.color = "black", rotation = 4,
equalizeManifests = FALSE, optimizeLatRes = TRUE, node.width = 1.5,
edge.width = 0.5, shapeMan = "rectangle", shapeLat = "ellipse",
shapeInt = "triangle", sizeMan = 4, sizeInt = 2, sizeLat = 4,
curve = 2, unCol = "#070b8c")
dev.off()
png
2
Appendix
If you are curious about the first diagram, I made it with TikZ. This is the code
\documentclass[border=3mm]{standalone}
\usepackage[applemac]{inputenc}
\usepackage[protrusion=true,expansion=true]{microtype}
\usepackage[bb=lucida,bbscaled=1,cal=boondoxo]{mathalfa}
\usepackage[stdmathitalics=true,math-style=iso,lucidasmallscale=true,romanfamily=bright]{lucimatx}
\usepackage{tikz}
\usetikzlibrary{arrows,positioning}
\newcommand{\at}{\makeatletter @\makeatother}
\begin{document}
\begin{tikzpicture}[auto,node distance=.5cm,
latent/.style={fill=red!20,circle,draw, thick,inner sep=0pt,minimum size=8mm,align=center},
observed/.style={fill=blue!20,rectangle,draw, thick,inner sep=0pt,minimum width=8mm,minimum height=8mm,align=center},
error/.style={fill=yellow!20,circle,draw, thick,inner sep=0pt,minimum width=8mm,minimum height=8mm,align=center},
paths/.style={->, thick, >=stealth'},
paths2/.style={<-, thick, >=stealth'},
twopaths/.style={<->, thick, >=stealth'}
]
%Define observed variables
\node [latent] (y1) at (0,0) {$y_1$};
\node [latent] (y2) [below=3cm of y1] {$y_2$};
\node [latent] (y3) [below=3cm of y2] {$y_3$};
%%%
\node [observed] (x2) [left=1.8cm of y1] {$x_2$};
\node [observed] (x1) [above=of x2] {$x_1$};
\node [observed] (x3) [below=of x2] {$x_3$};
\node [error] (ex1) [left=0.5cm of x1] {$\delta_1$};
\node [error] (ex2) [left=0.5cm of x2] {$\delta_2$};
\node [error] (ex3) [left=0.5cm of x3] {$\delta_3$};
%%%
\node [observed] (x5) [left=1.8cm of y2] {$x_5$};
\node [observed] (x4) [above=of x5] {$x_4$};
\node [observed] (x6) [below=of x5] {$x_6$};
\node [error] (ex4) [left=0.5cm of x4] {$\delta_4$};
\node [error] (ex5) [left=0.5cm of x5] {$\delta_5$};
\node [error] (ex6) [left=0.5cm of x6] {$\delta_6$};
%%%
\node [observed] (x8) [left=1.8cm of y3] {$x_8$};
\node [observed] (x7) [above=of x8] {$x_7$};
\node [observed] (x9) [below=of x8] {$x_9$};
\node [error] (ex7) [left=0.5cm of x7] {$\delta_7$};
\node [error] (ex8) [left=0.5cm of x8] {$\delta_8$};
\node [error] (ex9) [left=0.5cm of x9] {$\delta_9$};
% Draw paths form latent to observed variables
\foreach \all in {x1,x2,x3}{
\draw [paths] (y1.west) to node {} (\all.east);
}
\draw [paths] (x1.west) to (ex1.east);
\draw [paths] (x2.west) to (ex2.east);
\draw [paths] (x3.west) to (ex3.east);
%%%
\foreach \all in {x4,x5,x6}{
\draw [paths] (y2.west) to node {} (\all.east);
}
\draw [paths] (x4.west) to (ex4.east);
\draw [paths] (x5.west) to (ex5.east);
\draw [paths] (x6.west) to (ex6.east);
%%%
\foreach \all in {x7,x8,x9}{
\draw [paths] (y3.west) to node {} (\all.east);
}
\draw [paths] (x7.west) to (ex7.east);
\draw [paths] (x8.west) to (ex8.east);
\draw [paths] (x9.west) to (ex9.east);
%%%
\draw [twopaths] (y1.east) to [bend left=90] (y2.east);
\draw [twopaths] (y2.east) to [bend left=90] (y3.east);
\draw [twopaths] (y1.east) to [bend left=90] (y3.east);
\end{tikzpicture}
\end{document}
and then I ran pdf2svg cfa_example.pdf cfa_example.svg
.