Structural Equation Modelling in R (Part 2)

Latex

Linear models

Here we explain how to estimate SEM models coefficients without a theoretical introduction

Authors

Mauricio “Pachá” Vargas S.

Jodie Burchell

Published

January 15, 2017

Updated 2022-05-28: I moved the blog to Quarto, so I had to update the paths.

Brief explanation

This time I am glad to announce Jodie Burchell as a co-writer!

In Structural Equation Modelling in R (Part 1) I explained the basics of CFA. SEM was explained as a general case of CFA that was going be explained later, so here we go.

Structural Equation Modeling

SEM models or causal models arise from confirmatory models. A confimatory model becomes a causal model if all latent variables are defined by observed variables.

Bollen (1989) proposed a SEM model to study an aspect of democracy. In his work he analyzes the relationship between freedom of the press, freedom of political opposition and fairness of elections among other variables such as industrialization (details here).

We can represent Bollen’s model in a diagram:

help("PoliticalDemocracy") provides a description of the variables within the dataset but it is recommended to read Political Democracy and the Timing of Development by K.A. Bollen.

In this model, $x_{2}$ , $x_{2}$ and $x_{3}$ are the gross national product (GNP) per capita in 1960, the inanimate energy consumption per capita in 1960, and the percentage of the labor force in industry in 1960 respectively. Variables $y_{1}$ to $y_{8}$ represent:

Expert ratings of the freedom of the press in 1960
The freedom of political opposition in 1960
The fairness of elections in 1960
The effectiveness of the elected legislature in 1960
Expert ratings of the freedom of the press in 1965
The freedom of political opposition in 1965
The fairness of elections in 1965
The effectiveness of the elected legislature in 1965

Finally, $z_{1}$ , $z_{2}$ and $z_{3}$ represent the degree of industrialization in 1960, political democracy in 1960, and political democracy in 1965 respectively.

As in the case of CFA double-headed arrows represent an association, but in this case between the $y_{i}$ ’s single-headed arrows represent direct effects, and $ε_{i}$ , $δ_{i}$ , $ζ_{i}$ represent error terms.

Here the error terms help us to understand an SEM model as a net of regressions. In R notation $i n d 60 \sim x_{1} + x_{2} + x_{3}$ would be a short form of writing $i n d 60 = λ_{1} x_{1} + λ_{2} x_{2} + λ_{3} x_{3} + ζ_{1}$ where each $λ_{i}$ is a regression coefficient that can be estimated using ML, OLS or a variety of methods.

Using Lavaan to obtain parameters estimates

First we load the needed packages to perform the analysis:

# Needed packages to perform the analysis
load <- function(pkg){
  new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
  if (length(new.pkg))
    install.packages(new.pkg, dependencies = TRUE)
  sapply(pkg, require, character.only = TRUE)
}

packages <- c("lavaan","semPlot","semTools","nonnest2","htmlTable")
load(packages)

Then we fit the model using the sem() function:

# 1.2. Specify the model
Bollen.model <- '# measurement model
                    ind60 =~ x1 + x2 + x3
                    dem60 =~ y1 + y2 + y3 + y4
                    dem65 =~ y5 + y6 + y7 + y8
                 # regressions
                    dem60 ~ ind60
                    dem65 ~ ind60 + dem60
                 # residual correlations
                    y1 ~~ y5
                    y2 ~~ y4 + y6
                    y3 ~~ y7
                    y4 ~~ y8
                    y6 ~~ y8'

# 1.3. Fit the model
fit <- sem(Bollen.model, data = PoliticalDemocracy)

Then we obtain a model summary, confidence intervals and goodness of fit indicators:

# 2.1. Display summary output
summary(fit, fit.measures = FALSE)

lavaan 0.6.15 ended normally after 68 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        31

  Number of observations                            75

Model Test User Model:
                                                      
  Test statistic                                38.125
  Degrees of freedom                                35
  P-value (Chi-square)                           0.329

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  ind60 =~                                            
    x1                1.000                           
    x2                2.180    0.139   15.742    0.000
    x3                1.819    0.152   11.967    0.000
  dem60 =~                                            
    y1                1.000                           
    y2                1.257    0.182    6.889    0.000
    y3                1.058    0.151    6.987    0.000
    y4                1.265    0.145    8.722    0.000
  dem65 =~                                            
    y5                1.000                           
    y6                1.186    0.169    7.024    0.000
    y7                1.280    0.160    8.002    0.000
    y8                1.266    0.158    8.007    0.000

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  dem60 ~                                             
    ind60             1.483    0.399    3.715    0.000
  dem65 ~                                             
    ind60             0.572    0.221    2.586    0.010
    dem60             0.837    0.098    8.514    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
 .y1 ~~                                               
   .y5                0.624    0.358    1.741    0.082
 .y2 ~~                                               
   .y4                1.313    0.702    1.871    0.061
   .y6                2.153    0.734    2.934    0.003
 .y3 ~~                                               
   .y7                0.795    0.608    1.308    0.191
 .y4 ~~                                               
   .y8                0.348    0.442    0.787    0.431
 .y6 ~~                                               
   .y8                1.356    0.568    2.386    0.017

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                0.082    0.019    4.184    0.000
   .x2                0.120    0.070    1.718    0.086
   .x3                0.467    0.090    5.177    0.000
   .y1                1.891    0.444    4.256    0.000
   .y2                7.373    1.374    5.366    0.000
   .y3                5.067    0.952    5.324    0.000
   .y4                3.148    0.739    4.261    0.000
   .y5                2.351    0.480    4.895    0.000
   .y6                4.954    0.914    5.419    0.000
   .y7                3.431    0.713    4.814    0.000
   .y8                3.254    0.695    4.685    0.000
    ind60             0.448    0.087    5.173    0.000
   .dem60             3.956    0.921    4.295    0.000
   .dem65             0.172    0.215    0.803    0.422

# 2.2. Obtain confidence intervals for the estimated coefficients
parameterEstimates(fit, ci = TRUE, level = 0.95)

     lhs op   rhs   est    se      z pvalue ci.lower ci.upper
1  ind60 =~    x1 1.000 0.000     NA     NA    1.000    1.000
2  ind60 =~    x2 2.180 0.139 15.742  0.000    1.909    2.452
3  ind60 =~    x3 1.819 0.152 11.967  0.000    1.521    2.116
4  dem60 =~    y1 1.000 0.000     NA     NA    1.000    1.000
5  dem60 =~    y2 1.257 0.182  6.889  0.000    0.899    1.614
6  dem60 =~    y3 1.058 0.151  6.987  0.000    0.761    1.354
7  dem60 =~    y4 1.265 0.145  8.722  0.000    0.981    1.549
8  dem65 =~    y5 1.000 0.000     NA     NA    1.000    1.000
9  dem65 =~    y6 1.186 0.169  7.024  0.000    0.855    1.517
10 dem65 =~    y7 1.280 0.160  8.002  0.000    0.966    1.593
11 dem65 =~    y8 1.266 0.158  8.007  0.000    0.956    1.576
12 dem60  ~ ind60 1.483 0.399  3.715  0.000    0.701    2.265
13 dem65  ~ ind60 0.572 0.221  2.586  0.010    0.139    1.006
14 dem65  ~ dem60 0.837 0.098  8.514  0.000    0.645    1.030
15    y1 ~~    y5 0.624 0.358  1.741  0.082   -0.079    1.326
16    y2 ~~    y4 1.313 0.702  1.871  0.061   -0.063    2.689
17    y2 ~~    y6 2.153 0.734  2.934  0.003    0.715    3.591
18    y3 ~~    y7 0.795 0.608  1.308  0.191   -0.396    1.986
19    y4 ~~    y8 0.348 0.442  0.787  0.431   -0.519    1.215
20    y6 ~~    y8 1.356 0.568  2.386  0.017    0.242    2.470
21    x1 ~~    x1 0.082 0.019  4.184  0.000    0.043    0.120
22    x2 ~~    x2 0.120 0.070  1.718  0.086   -0.017    0.256
23    x3 ~~    x3 0.467 0.090  5.177  0.000    0.290    0.643
24    y1 ~~    y1 1.891 0.444  4.256  0.000    1.020    2.762
25    y2 ~~    y2 7.373 1.374  5.366  0.000    4.680   10.066
26    y3 ~~    y3 5.067 0.952  5.324  0.000    3.202    6.933
27    y4 ~~    y4 3.148 0.739  4.261  0.000    1.700    4.596
28    y5 ~~    y5 2.351 0.480  4.895  0.000    1.410    3.292
29    y6 ~~    y6 4.954 0.914  5.419  0.000    3.162    6.746
30    y7 ~~    y7 3.431 0.713  4.814  0.000    2.034    4.829
31    y8 ~~    y8 3.254 0.695  4.685  0.000    1.893    4.615
32 ind60 ~~ ind60 0.448 0.087  5.173  0.000    0.279    0.618
33 dem60 ~~ dem60 3.956 0.921  4.295  0.000    2.151    5.762
34 dem65 ~~ dem65 0.172 0.215  0.803  0.422   -0.249    0.593

# 2.3 Obtain goodness of fit indicators of the model
fitMeasures(fit, c("chisq", "rmsea", "srmr", "gfi", "ecvi"))

 chisq  rmsea   srmr    gfi   ecvi 
38.125  0.035  0.044  0.923  1.335

reliability(fit)

           ind60     dem60     dem65
alpha  0.9023348 0.8587945 0.8827394
omega  0.9437375 0.8375193 0.8556193
omega2 0.9437375 0.8375193 0.8556193
omega3 0.9436896 0.8411801 0.8575540
avevar 0.8588015 0.5996707 0.6408647

Exporting lavaan output to tables

Lavaan produces a lot of potential pieces of output, but I suggest that at minimum that you export the above model summary, confidence intervals and goodness of fit indicators as part of your final tables:

htmlTable(txtRound(parameterEstimates(fit, ci = TRUE, level = 0.95), digits = 3, excl.cols = c(1,3)), align = "l|r|r|r|r|r|r|r|r|r")

	lhs	op	rhs	est	se	z	pvalue	ci.lower	ci.upper	excl.cols1	excl.cols2
1	ind60	=~	x1	1	0			1	1	60.000	1.000
2	ind60	=~	x2	2.18036773093799	0.13850927581342	15.7416730261089	0	1.90889453881896	2.45184092305702	60.000	2.000
3	ind60	=~	x3	1.81851130390037	0.151958123393728	11.967187165036	0	1.52067885489037	2.11634375291037	60.000	3.000
4	dem60	=~	y1	1	0			1	1	60.000	1.000
5	dem60	=~	y2	1.25674649791291	0.182439625083041	6.88856106419249	5.63593616220714e-12	0.899171403397163	1.61432159242867	60.000	2.000
6	dem60	=~	y3	1.05771677041185	0.151383167907233	6.98701701803479	2.80797607388195e-12	0.761011213448096	1.35442232737561	60.000	3.000
7	dem60	=~	y4	1.26478653624678	0.145006216512974	8.72229182073456	0	0.980579574346933	1.54899349814663	60.000	4.000
8	dem65	=~	y5	1	0			1	1	65.000	5.000
9	dem65	=~	y6	1.18569629880997	0.168810424671378	7.02383339842995	2.15871764908115e-12	0.854833946239158	1.51655865138078	65.000	6.000
10	dem65	=~	y7	1.27951217694957	0.159901625501362	8.00187098122195	1.33226762955019e-15	0.966110749897488	1.59291360400165	65.000	7.000
11	dem65	=~	y8	1.26594698075209	0.158111128469257	8.00669119883134	1.11022302462516e-15	0.956054863397361	1.57583909810682	65.000	8.000
12	dem60	~	ind60	1.48300054022542	0.399148608668019	3.7154095192121	0.000202874853353796	0.700683642756834	2.26531743769401	60.000	60.000
13	dem65	~	ind60	0.57233619475932	0.221313792034602	2.58608462444959	0.00970730942497577	0.138569133089512	1.00610325642913	65.000	60.000
14	dem65	~	dem60	0.837344811202947	0.0983509576681464	8.51384502048569	0	0.644580476328357	1.03010914607754	65.000	60.000
15	y1	~~	y5	0.623671074030225	0.35832022743272	1.740541075503	0.0817640540660225	-0.0786236666701067	1.32596581473056	1.000	5.000
16	y2	~~	y4	1.31311257673291	0.701983268421085	1.87057531967448	0.0614039681237297	-0.0627493471221321	2.68897450058795	2.000	4.000
17	y2	~~	y6	2.15286126610351	0.733775932984817	2.9339491380512	0.00334679046794228	0.714686864730998	3.59103566747603	2.000	6.000
18	y3	~~	y7	0.794960276374888	0.607698047456939	1.30815012439417	0.190822395042678	-0.396106010116026	1.9860265628658	3.000	7.000
19	y4	~~	y8	0.348226039083801	0.442240063458797	0.787414049193768	0.431039525120505	-0.51854855781615	1.21500063598375	4.000	8.000
20	y6	~~	y8	1.35616712069039	0.568286088953423	2.38641618553071	0.0170134849955328	0.242346853426557	2.46998738795423	6.000	8.000
21	x1	~~	x1	0.0815493454004797	0.0194897389908495	4.18421947255257	2.86147560029093e-05	0.0433501589103286	0.119748531890631	1.000	1.000
22	x2	~~	x2	0.119806478198423	0.0697206404642653	1.71837891047241	0.0857275245081071	-0.0168434660906033	0.256456422487449	2.000	2.000
23	x3	~~	x3	0.466702578156235	0.0901564950492953	5.1765829838555	2.25986560797864e-07	0.289999094887253	0.643406061425218	3.000	3.000
24	y1	~~	y1	1.89139551968812	0.444423793047537	4.25583766953227	2.08267778309956e-05	1.02034089144227	2.76245014793398	1.000	1.000
25	y2	~~	y2	7.37286854022783	1.37389267218399	5.36640793673328	8.03201680721344e-08	4.68008838412373	10.0656486963319	2.000	2.000
26	y3	~~	y3	5.06746209955855	0.951727082521065	5.32449080479581	1.01236222738166e-07	3.20211129470589	6.93281290441122	3.000	3.000
27	y4	~~	y4	3.1479047972895	0.738784120582109	4.26092644602212	2.03581231339456e-05	1.69991452859847	4.59589506598053	4.000	4.000
28	y5	~~	y5	2.350970469996	0.480238605529475	4.89542165691572	9.8095166145562e-07	1.40972009917249	3.29222084081951	5.000	5.000
29	y6	~~	y6	4.95396775008949	0.914246760415895	5.41863309183169	6.00564233899092e-08	3.16207702669192	6.74585847348706	6.000	6.000
30	y7	~~	y7	3.43137392075022	0.71284340476721	4.81364335813808	1.48203197247732e-06	2.03422652078958	4.82852132071085	7.000	7.000
31	y8	~~	y8	3.25408501484355	0.694604197223161	4.68480471015364	2.80226990057031e-06	1.8926858047758	4.6154842249113	8.000	8.000
32	ind60	~~	ind60	0.448437150951413	0.0866917092640795	5.17278012809031	2.30636352682723e-07	0.2785245230356	0.618349778867227	60.000	60.000
33	dem60	~~	dem60	3.95603310783351	0.921184706059898	4.29450584861998	1.75082866160636e-05	2.150544260847	5.76152195482003	60.000	60.000
34	dem65	~~	dem65	0.172481325461862	0.214804591985849	0.802968520678668	0.421992929579229	-0.248527938544224	0.593490589467948	65.000	65.000

htmlTable(txtRound(reliability(fit),3), align = "l|r|r|r|r")

	ind60	dem60	dem65
alpha	0.902	0.859	0.883
omega	0.944	0.838	0.856
omega2	0.944	0.838	0.856
omega3	0.944	0.841	0.858
avevar	0.859	0.600	0.641

There are other packages to convert R output to tables such as knitr and xtables that can also export to $L A T E X$ .

Creating a diagram for the model in R

Now let’s get on to producing the exciting part of an SEM analysis - the model diagram. This code might not be good-looking but the result is!

semPaths(fit, style = "lisrel",
        whatLabels = "std", edge.label.cex = .6, node.label.cex = .6,
        label.prop = 0.9, edge.label.color = "black", rotation = 4,
        equalizeManifests = FALSE, optimizeLatRes = TRUE, node.width = 1.5,
        edge.width = 0.5, shapeMan = "rectangle", shapeLat = "ellipse",
        shapeInt = "triangle", sizeMan = 4, sizeInt = 2, sizeLat = 4,
        curve = 2, unCol = "#070b8c")

If you are running a script instead of a R Markdown document, you can save the plot as svg with these lines:

svg("img/sem_example_2.svg")
semPaths(fit, style = "lisrel",
        whatLabels = "std", edge.label.cex = .6, node.label.cex = .6,
        label.prop = 0.9, edge.label.color = "black", rotation = 4,
        equalizeManifests = FALSE, optimizeLatRes = TRUE, node.width = 1.5,
        edge.width = 0.5, shapeMan = "rectangle", shapeLat = "ellipse",
        shapeInt = "triangle", sizeMan = 4, sizeInt = 2, sizeLat = 4,
        curve = 2, unCol = "#070b8c")
dev.off()

png 
  2

Appendix

This is the TikZ (LaTeX) code to obtain the first diagram.

\documentclass[border=3mm]{standalone}
\usepackage[applemac]{inputenc}
\usepackage[protrusion=true,expansion=true]{microtype}
\usepackage[bb=lucida,bbscaled=1,cal=boondoxo]{mathalfa}
\usepackage[stdmathitalics=true,math-style=iso,lucidasmallscale=true,romanfamily=bright]{lucimatx}
\usepackage{tikz}
\usetikzlibrary{arrows,positioning,calc}
\newcommand{\at}{\makeatletter @\makeatother}
\begin{document}
\begin{tikzpicture}[auto,node distance=.5cm,
    latent/.style={fill=red!20,circle,draw, thick,inner sep=0pt,minimum size=8mm,align=center},
    observed/.style={fill=blue!20,rectangle,draw, thick,inner sep=0pt,minimum width=8mm,minimum height=8mm,align=center},
    error/.style={fill=yellow!20,circle,draw, thick,inner sep=0pt,minimum width=8mm,minimum height=8mm,align=center},
    paths/.style={->,  thick, >=stealth'},
    paths2/.style={<-,  thick, >=stealth'},
    twopaths/.style={<->,  thick, >=stealth'}
]

%Define observed variables
\node [latent] (z2) at (0,0) {$z_2$};
\node [latent] (z3) [below=7cm of z2]  {$z_3$};

\node [error] (ez2) [above=0.5cm of z2]  {$\zeta_2$};
\node [error] (ez3) [below=0.5cm of z3]  {$\zeta_3$};

%%%

\node [observed] (y2) [above left=0.2cm and 1.8cm of z2]  {$y_2$};
\node [observed] (y1) [above=1cm of y2]  {$y_1$};
\node [observed] (y3) [below=1cm of y2]  {$y_3$};
\node [observed] (y4) [below=1cm of y3]  {$y_4$};

\node [error] (ey1) [left=0.5cm of y1]  {$\varepsilon_1$};
\node [error] (ey2) [left=0.5cm of y2]  {$\varepsilon_2$};
\node [error] (ey3) [left=0.5cm of y3]  {$\varepsilon_3$};
\node [error] (ey4) [left=0.5cm of y4]  {$\varepsilon_4$};

%%%

\node [observed] (y6) [above left=0.2cm and 1.8cm of z3]  {$y_6$};
\node [observed] (y5) [above=1cm of y6]  {$y_5$};
\node [observed] (y7) [below=1cm of y6]  {$y_7$};
\node [observed] (y8) [below=1cm of y7]  {$y_8$};

\node [error] (ey5) [left=0.5cm of y5]  {$\varepsilon_5$};
\node [error] (ey6) [left=0.5cm of y6]  {$\varepsilon_6$};
\node [error] (ey7) [left=0.5cm of y7]  {$\varepsilon_7$};
\node [error] (ey8) [left=0.5cm of y8]  {$\varepsilon_8$};

%%%

\node [latent] (z1) [right=3cm of z2]  {$z_1$};

\node [error] (ez1) [above=0.5cm of z1]  {$\zeta_1$};

%%%

\node [observed] (x2) [right=1.8cm of z1]  {$x_2$};
\node [observed] (x1) [above=of x2]  {$x_1$};
\node [observed] (x3) [below=of x2]  {$x_3$};

\node [error] (ex1) [right=0.5cm of x1]  {$\delta_1$};
\node [error] (ex2) [right=0.5cm of x2]  {$\delta_2$};
\node [error] (ex3) [right=0.5cm of x3]  {$\delta_3$};

% Draw paths form latent to observed variables

\foreach \all in {y1,y2,y3,y4}{
    \draw [paths] (z2.west) to node {} (\all.east);
}

\draw [paths] (y1.west) to (ey1.east);
\draw [paths] (y2.west) to (ey2.east);
\draw [paths] (y3.west) to (ey3.east);
\draw [paths] (y4.west) to (ey4.east);

%%%

\foreach \all in {y5,y6,y7,y8}{
    \draw [paths] (z3.west) to node {} (\all.east);
}

\draw [paths] (y5.west) to (ey5.east);
\draw [paths] (y6.west) to (ey6.east);
\draw [paths] (y7.west) to (ey7.east);
\draw [paths] (y8.west) to (ey8.east);

%%%

\foreach \all in {x1,x2,x3}{
    \draw [paths] (z1.east) to node {} (\all.west);
}

\draw [paths] (x1.east) to (ex1.west);
\draw [paths] (x2.east) to (ex2.west);
\draw [paths] (x3.east) to (ex3.west);

%%%

\draw [paths] (z1.north) to (ez1.south);
\draw [paths] (z2.north) to (ez2.south);
\draw [paths] (z3.south) to (ez3.north);

%%%

\draw [paths] (z2.south) to (z3.north);
\draw [paths] (z1.west) to (z2.east);
\draw [paths] (z1.west) to (z3.north);

%%%

\draw [twopaths] (y1.south west) to [bend right=90] (y5.north west);
\draw [twopaths] (y2.south west) to [bend right=90] (y6.north west);
\draw [twopaths] (y3.south west) to [bend right=90] (y7.north west);
\draw [twopaths] (y4.south west) to [bend right=90] (y8.north west);

\draw [twopaths] (y2.south west) to [out=180,in=90] ($(ey3)+(-1,0)$) to [out=-90, in=-180] (y4.north west);
\draw [twopaths] (y6.south west) to [out=180,in=90] ($(ey7)+(-1,0)$) to [out=-90, in=-180] (y8.north west);

\end{tikzpicture}
\end{document}

and then I ran pdf2svg sem_example.pdf sem_example.svg.