Structural Equation Modelling in R (Part 1)

R
Latex
Linear models
Here I explain how to estimate CFA models coefficients without a theoretical introduction
Author

Mauricio “Pachá” Vargas S.

Published

December 7, 2016

Updated 2022-05-28: I moved the blog to Quarto, so I had to update the paths.

Brief explanation

Structural Equation Modelling (SEM) is a state of art methodology and fulfills much of broader discusion about statistical modeling, and allows to make inference and causal analysis. SEM models are regression models broadly used in Marketing, Human Resources, Biostatistics and Medicine, revealing their flexibility as analytical tool.

Despite being a state-of-the-art methodology, SEM does not create new developments of statistical theory, since it consists in a combination of multivariate analysis, factor analysis and regression analysis. SEM makes it possible to validate or reject one or more hypothesis about an existing relation between different variables, to estimate simultaneously dependency relations between variables and estimate measurement error in model’s variables. SEM may be understood as a net of unidirectional or bidirectional paths linking different variables. Such net can be validated using multivariate data.

Confirmatory Factor Analysis

A usual methodology for model evaluation is Confirmatory Factor Analysis (CFA) that is a particular case of SEM. It is a process which consists in specifying quantity and kinds of observed variables to one or more latent variables and analyze how well those variables measure the latent variable itself. Think of a latent variable as an artificial variable that is represented as a linear combination of observed variables.

Holzinger and Swineford (1939) proposed one the most famous CFA models. In their work they analysed the mental ability based test scores of children from two different schools (details here)

We can represent Holzinger and Swineford’s model in a diagram:

cfa_example.svg

In this model, \(\newcommand{\R}{\mathbb{R}} x_i\) are exogenous variables that record scores and \(y_i\) are latent (and endogenous) variables that represent visual ability, textual ability and speed ability respectively, double-headed arrows represent an association between \(y_1\), \(y_2\), and \(y_3\), single-headed arrows represent direct effects, and \(\delta_i\) represent error terms.

A factor loading is a measurable effect which reflects the incidence of an observed variable over another either observed or latent variable. Interpreting factor loadings is equivalent to interpreting direct effects in a linear econometric model. In this example factor loadings are well defined as there are no missing relations between variables, is known which observed variables have an effect over defined latent variables, and both latent variables covariate so there is a relation between latent variables.

Model evaluation

When I wrote my thesis I had to study SEM’s goodness of fit and its indicators. The interested reader may visit my Github repo to read more about some important linear algebra aspects of SEM but here I present a table that synthesizes 50% of my thesis:

Indicator Type Classification Direction Ideal values Normed (0-1)
RMSEA significance adjusted less is better \(\leq 0,05\) no
SMSR significance unadjusted less is better \(<0,1\) no
GFI significance unadjusted more is better \(>0,9\) yes

This table is useful for a cheatsheet and to keep in mind what to look when comparing models.

Using Lavaan to obtain parameters estimates

I highly recommend using lavaan. I have used AMOS and Stata but I guess R’s flexibility is unbeatable at the moment. Lavaan also provides HolzingerSwineford1939 dataset that contains the complete recordings made by Holzinger and Swineford.

First I load the needed packages to perform the analysis:

# Needed packages to perform the analysis
load <- function(pkg){
  new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
  if (length(new.pkg))
    install.packages(new.pkg, dependencies = TRUE)
  sapply(pkg, require, character.only = TRUE)
} 

packages <- c("lavaan","semPlot","semTools","nonnest2","htmlTable")
load(packages)

Then I fit the model using the cfa() function:

# 1.2. Specify the model
HS.model <- ' visual  =~ x1 + x2 + x3      
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '

# 1.3. Fit the model
fit <- cfa(HS.model, data = HolzingerSwineford1939)

Then I obtain a model summary, confidence intervals and goodness of fit indicators:

# 2.1. Display summary output
summary(fit, fit.measures = FALSE)
lavaan 0.6.15 ended normally after 35 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        21

  Number of observations                           301

Model Test User Model:
                                                      
  Test statistic                                85.306
  Degrees of freedom                                24
  P-value (Chi-square)                           0.000

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual =~                                           
    x1                1.000                           
    x2                0.554    0.100    5.554    0.000
    x3                0.729    0.109    6.685    0.000
  textual =~                                          
    x4                1.000                           
    x5                1.113    0.065   17.014    0.000
    x6                0.926    0.055   16.703    0.000
  speed =~                                            
    x7                1.000                           
    x8                1.180    0.165    7.152    0.000
    x9                1.082    0.151    7.155    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual ~~                                           
    textual           0.408    0.074    5.552    0.000
    speed             0.262    0.056    4.660    0.000
  textual ~~                                          
    speed             0.173    0.049    3.518    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                0.549    0.114    4.833    0.000
   .x2                1.134    0.102   11.146    0.000
   .x3                0.844    0.091    9.317    0.000
   .x4                0.371    0.048    7.779    0.000
   .x5                0.446    0.058    7.642    0.000
   .x6                0.356    0.043    8.277    0.000
   .x7                0.799    0.081    9.823    0.000
   .x8                0.488    0.074    6.573    0.000
   .x9                0.566    0.071    8.003    0.000
    visual            0.809    0.145    5.564    0.000
    textual           0.979    0.112    8.737    0.000
    speed             0.384    0.086    4.451    0.000
# 2.2. Obtain confidence intervals for the estimated coefficients
parameterEstimates(fit, ci = TRUE, level = 0.95)
       lhs op     rhs   est    se      z pvalue ci.lower ci.upper
1   visual =~      x1 1.000 0.000     NA     NA    1.000    1.000
2   visual =~      x2 0.554 0.100  5.554      0    0.358    0.749
3   visual =~      x3 0.729 0.109  6.685      0    0.516    0.943
4  textual =~      x4 1.000 0.000     NA     NA    1.000    1.000
5  textual =~      x5 1.113 0.065 17.014      0    0.985    1.241
6  textual =~      x6 0.926 0.055 16.703      0    0.817    1.035
7    speed =~      x7 1.000 0.000     NA     NA    1.000    1.000
8    speed =~      x8 1.180 0.165  7.152      0    0.857    1.503
9    speed =~      x9 1.082 0.151  7.155      0    0.785    1.378
10      x1 ~~      x1 0.549 0.114  4.833      0    0.326    0.772
11      x2 ~~      x2 1.134 0.102 11.146      0    0.934    1.333
12      x3 ~~      x3 0.844 0.091  9.317      0    0.667    1.022
13      x4 ~~      x4 0.371 0.048  7.779      0    0.278    0.465
14      x5 ~~      x5 0.446 0.058  7.642      0    0.332    0.561
15      x6 ~~      x6 0.356 0.043  8.277      0    0.272    0.441
16      x7 ~~      x7 0.799 0.081  9.823      0    0.640    0.959
17      x8 ~~      x8 0.488 0.074  6.573      0    0.342    0.633
18      x9 ~~      x9 0.566 0.071  8.003      0    0.427    0.705
19  visual ~~  visual 0.809 0.145  5.564      0    0.524    1.094
20 textual ~~ textual 0.979 0.112  8.737      0    0.760    1.199
21   speed ~~   speed 0.384 0.086  4.451      0    0.215    0.553
22  visual ~~ textual 0.408 0.074  5.552      0    0.264    0.552
23  visual ~~   speed 0.262 0.056  4.660      0    0.152    0.373
24 textual ~~   speed 0.173 0.049  3.518      0    0.077    0.270
# 2.3 Obtain goodness of fit indicators of the model
fitMeasures(fit, c("chisq", "rmsea", "srmr", "gfi", "ecvi"))
 chisq  rmsea   srmr    gfi   ecvi 
85.306  0.092  0.065  0.943  0.423 
reliability(fit)
          visual   textual     speed
alpha  0.6261171 0.8827069 0.6884550
omega  0.6253180 0.8851754 0.6877600
omega2 0.6253180 0.8851754 0.6877600
omega3 0.6120052 0.8850608 0.6858417
avevar 0.3705589 0.7210163 0.4244883

Exporting lavaan output to tables

Lavaan produces a lot of potential pieces of output, but I suggest that at minimum that you export the above model summary, confidence intervals and goodness of fit indicators as part of your final tables:

htmlTable(txtRound(parameterEstimates(fit, ci = TRUE, level = 0.95), digits = 3, excl.cols = c(1,3)), align = "l|r|r|r|r|r|r|r|r|r")
lhs op rhs est se z pvalue ci.lower ci.upper excl.cols1 excl.cols2
1 visual =~ x1 1 0 1 1 visual 1.000
2 visual =~ x2 0.553500293763662 0.0996651187683112 5.55360090474952 2.79844001305207e-08 0.358160250462865 0.748840337064459 visual 2.000
3 visual =~ x3 0.729370209767044 0.109109703120018 6.68474195154533 2.31332730749045e-11 0.515519121287951 0.943221298246138 visual 3.000
4 textual =~ x4 1 0 1 1 textual 4.000
5 textual =~ x5 1.11307657811851 0.0654201086095661 17.0142881413032 0 0.984855521379059 1.24129763485796 textual 5.000
6 textual =~ x6 0.926146236632328 0.0554488562419873 16.7027112802919 0 0.817468475414094 1.03482399785056 textual 6.000
7 speed =~ x7 1 0 1 1 speed 7.000
8 speed =~ x8 1.17995083947085 0.164986572020982 7.15179923442977 8.56426041195846e-13 0.856583100377002 1.5033185785647 speed 8.000
9 speed =~ x9 1.08153015685908 0.151167438085725 7.15451800040272 8.39772695826468e-13 0.785247422575875 1.37781289114229 speed 9.000
10 x1 ~~ x1 0.549053973199019 0.11360092326415 4.83318231421707 1.34367627646625e-06 0.326400254990787 0.77170769140725 1.000 1.000
11 x2 ~~ x2 1.13383901648387 0.101723370956 11.146298100702 0 0.934464873024101 1.33321315994364 2.000 2.000
12 x3 ~~ x3 0.844324045708499 0.0906231835368078 9.31686586981978 0 0.666705869811992 1.02194222160501 3.000 3.000
13 x4 ~~ x4 0.371172992836844 0.0477177909891553 7.77850326141877 7.32747196252603e-15 0.277647841076289 0.464698144597398 4.000 4.000
14 x5 ~~ x5 0.446255070260067 0.0583927754760264 7.64229935333834 2.1316282072803e-14 0.331807333369722 0.560702807150413 5.000 5.000
15 x6 ~~ x6 0.356202664278948 0.0430349678663535 8.27705194029997 2.22044604925031e-16 0.271855677185057 0.440549651372839 6.000 6.000
16 x7 ~~ x7 0.799391638325101 0.0813815590555713 9.82276141673862 0 0.639886713570462 0.95889656307974 7.000 7.000
17 x8 ~~ x8 0.487697083769163 0.0741940912520304 6.57326042464085 4.92252905104351e-11 0.342279337049506 0.633114830488821 8.000 8.000
18 x9 ~~ x9 0.566131293573695 0.0707369371033875 8.00333343167304 1.11022302462516e-15 0.42748944447438 0.704773142673009 9.000 9.000
19 visual ~~ visual 0.809315981090822 0.145462407414288 5.56374664407847 2.64043220621346e-08 0.524214901454325 1.09441706072732 visual visual
20 textual ~~ textual 0.97949137285969 0.112105849301359 8.73720130540782 0 0.759767945772751 1.19921479994663 textual textual
21 speed ~~ speed 0.383747649264106 0.0862091944288757 4.45135407895159 8.53305080417144e-06 0.214780733047299 0.552714565480913 speed speed
22 visual ~~ textual 0.408232442095546 0.0735238821959155 5.5523787632398 2.81808156810115e-08 0.264128280987986 0.552336603203106 visual textual
23 visual ~~ speed 0.26222460204864 0.0562763994994703 4.65958384653069 3.16849313275469e-06 0.15192488585009 0.372524318247189 visual speed
24 textual ~~ speed 0.173494684580679 0.0493146604470061 3.51811576938906 0.000434622709184929 0.0768397261947255 0.270149642966633 textual speed
htmlTable(txtRound(reliability(fit),3), align = "l|r|r|r|r")
visual textual speed
alpha 0.626 0.883 0.688
omega 0.625 0.885 0.688
omega2 0.625 0.885 0.688
omega3 0.612 0.885 0.686
avevar 0.371 0.721 0.424

There are other packages to convert R output to tables such as knitr and xtables that can also export to \(\rm\LaTeX\).

Creating a diagram for the model in R

Now let’s get on to producing the exciting part of an SEM analysis - the model diagram. This code might not be good-looking but the result is!

semPaths(fit, style = "lisrel", 
        whatLabels = "std", edge.label.cex = .6, node.label.cex = .6, 
        label.prop = 0.9, edge.label.color = "black", rotation = 4, 
        equalizeManifests = FALSE, optimizeLatRes = TRUE, node.width = 1.5, 
        edge.width = 0.5, shapeMan = "rectangle", shapeLat = "ellipse", 
        shapeInt = "triangle", sizeMan = 4, sizeInt = 2, sizeLat = 4, 
        curve = 2, unCol = "#070b8c")

If you are running a script instead of a R Markdown document, you can save the plot as svg with these lines:

svg("img/cfa_example_2.svg")
semPaths(fit, style = "lisrel", 
        whatLabels = "std", edge.label.cex = .6, node.label.cex = .6, 
        label.prop = 0.9, edge.label.color = "black", rotation = 4, 
        equalizeManifests = FALSE, optimizeLatRes = TRUE, node.width = 1.5, 
        edge.width = 0.5, shapeMan = "rectangle", shapeLat = "ellipse", 
        shapeInt = "triangle", sizeMan = 4, sizeInt = 2, sizeLat = 4, 
        curve = 2, unCol = "#070b8c")
dev.off()
png 
  2 

Appendix

If you are curious about the first diagram, I made it with TikZ. This is the code

\documentclass[border=3mm]{standalone}
\usepackage[applemac]{inputenc}
\usepackage[protrusion=true,expansion=true]{microtype}
\usepackage[bb=lucida,bbscaled=1,cal=boondoxo]{mathalfa}
\usepackage[stdmathitalics=true,math-style=iso,lucidasmallscale=true,romanfamily=bright]{lucimatx}
\usepackage{tikz}
\usetikzlibrary{arrows,positioning}
\newcommand{\at}{\makeatletter @\makeatother}
\begin{document}
\begin{tikzpicture}[auto,node distance=.5cm,
    latent/.style={fill=red!20,circle,draw, thick,inner sep=0pt,minimum size=8mm,align=center},
    observed/.style={fill=blue!20,rectangle,draw, thick,inner sep=0pt,minimum width=8mm,minimum height=8mm,align=center},
    error/.style={fill=yellow!20,circle,draw, thick,inner sep=0pt,minimum width=8mm,minimum height=8mm,align=center},
    paths/.style={->,  thick, >=stealth'},
    paths2/.style={<-,  thick, >=stealth'},
    twopaths/.style={<->,  thick, >=stealth'}
]

%Define observed variables
\node [latent] (y1) at (0,0) {$y_1$};
\node [latent] (y2) [below=3cm of y1]  {$y_2$};
\node [latent] (y3) [below=3cm of y2]  {$y_3$};

%%%

\node [observed] (x2) [left=1.8cm of y1]  {$x_2$};
\node [observed] (x1) [above=of x2]  {$x_1$};
\node [observed] (x3) [below=of x2]  {$x_3$};

\node [error] (ex1) [left=0.5cm of x1]  {$\delta_1$};
\node [error] (ex2) [left=0.5cm of x2]  {$\delta_2$};
\node [error] (ex3) [left=0.5cm of x3]  {$\delta_3$};

%%%

\node [observed] (x5) [left=1.8cm of y2]  {$x_5$};
\node [observed] (x4) [above=of x5]  {$x_4$};
\node [observed] (x6) [below=of x5]  {$x_6$};

\node [error] (ex4) [left=0.5cm of x4]  {$\delta_4$};
\node [error] (ex5) [left=0.5cm of x5]  {$\delta_5$};
\node [error] (ex6) [left=0.5cm of x6]  {$\delta_6$};

%%%

\node [observed] (x8) [left=1.8cm of y3]  {$x_8$};
\node [observed] (x7) [above=of x8]  {$x_7$};
\node [observed] (x9) [below=of x8]  {$x_9$};

\node [error] (ex7) [left=0.5cm of x7]  {$\delta_7$};
\node [error] (ex8) [left=0.5cm of x8]  {$\delta_8$};
\node [error] (ex9) [left=0.5cm of x9]  {$\delta_9$};

% Draw paths form latent to observed variables

\foreach \all in {x1,x2,x3}{
    \draw [paths] (y1.west) to node {} (\all.east);
}

\draw [paths] (x1.west) to (ex1.east);
\draw [paths] (x2.west) to (ex2.east);
\draw [paths] (x3.west) to (ex3.east);

%%%

\foreach \all in {x4,x5,x6}{
    \draw [paths] (y2.west) to node {} (\all.east);
}

\draw [paths] (x4.west) to (ex4.east);
\draw [paths] (x5.west) to (ex5.east);
\draw [paths] (x6.west) to (ex6.east);

%%%

\foreach \all in {x7,x8,x9}{
    \draw [paths] (y3.west) to node {} (\all.east);
}

\draw [paths] (x7.west) to (ex7.east);
\draw [paths] (x8.west) to (ex8.east);
\draw [paths] (x9.west) to (ex9.east);

%%%

\draw [twopaths] (y1.east) to [bend left=90] (y2.east);
\draw [twopaths] (y2.east) to [bend left=90] (y3.east);

\draw [twopaths] (y1.east) to [bend left=90] (y3.east);


\end{tikzpicture}
\end{document} 

and then I ran pdf2svg cfa_example.pdf cfa_example.svg.