Fundamental Concepts of Confirmatory Factor Analysis (CFA)

Core Concepts

  • Structural Equation Modeling (SEM) tests the relationships between non-ostensible (endogenous) factors
    • Much like a linear regression model tests relationships between ostensible (exogenous) variables / items
  • But SEM:
    • Is more flexible
    • Can test more complex models
      • Including causal relationships
    • Can investigate endogenous variables as well as exogenous
    • Typically uses maximum likelihood estimation instead of ordinary least squares

Core Concepts (cont.)

Basic Analyses

  • The results of SEMs are quite similar to those to CFAs
  • In SEMs, we assess outputs of:
    1. Model fit (χ2, RMSEA, SRMR, CFI, TLI, etc.)
    2. Model parameters
      • Factor loadings
      • Covariances & variances between factor indicators & factors
      • Often with especial attention given to factor inter-relationships
      • Error variances

Conceiving of SEMs

  • SEMs are often conceived of as path diagrams based on the same ideas we used for CFAs:
  • Note that we are modeling error contributions to the factor indicators
    • Not instead unique variance for each factor indicator

Same Model without Factor Indicators (Items) or Error Terms

  • We sometimes simplify the diagrammed model to present only the factors
    • This does not imply that the unpresented parameters are not computed
  • A double-headed arrow denotes a predicted correlation (or bidirectional) relationship between the factors:
  • A single-headed arrow denotes a predicted causal relationship between the factors such that Factor A causes changes in Factor B:

Three Factors

  • The power of SEMs begins to show when we consider several factors
  • And more complex relationships, e.g.,
    • Factor A affects Factor C
    • Factor B affects Factor C
    • Factor A is unrelated to Factor B

Three Factors with Mediation

  • Here:
    • Factor B affects Factor C
    • Factor A affects Factor C by first affecting Factor B
  • I.e., Factor B mediates the relationships between Factors A & C
    • Here, Factor A has no direct effect on Factor C

Mediation vs. Moderation

  • Mediation:
    • When a predictor has no direct effect on an outcome
    • But that predictor affects something else that does have a direct effect on that outcome
    • E.g., hand washing per se doesn’t affect infections
      • Hand washing affects the number of microbes available to infect
  • Moderation:
    • When a predictor has a direct effect on an outcome
    • But the magnitude of the effect is affected by something else
    • E.g., Thoroughness of hand washing affects number of microbes

Mediation vs. Moderation (cont.)

  • Again, mediators are drawn as going through another factor:
  • Moderators are drawn as affecting the path between the two:

Yet More Complex Models

  • And, sure, a model could contain:
    • A direct effect,
    • A mediating effect, and
    • A moderating effect
  • And all of these could be tested

Mechanics of SEMs

Assumptions

  • Normality
    • Typically assume multivariate normality
      • N.b., maximum likelihood estimation is rather robust against departures from normality
        • But strongly multivariate non-normal data can create larger χ2s
        • Leading to higher Type 2 errors (false negatives, i.e., a falsely poorly-fit fit model)
          • I.e., parameter estimates per se will likley be reasonable
            • (If not asymptotically unbiased)
          • But standard errors (and thus χ2s) will be large
            • And probably somehow biased

Assumptions (cont.)

Assumptions (cont.)

  • Sample size
  • Ordinal data
    • If monotonic & arguably sample a non-ostensible continuous construct
    • And have a large N (> ~500)
    • Can include via polychoric correlations
      • Or treat as interval
        • When there are several (>4) response options
        • And data are nearly normal

SEM Procedure

  • Largely the same as for a CFA

    • Usually with more parameters
    • And usually interested most on relationships between endogenous factors
  • Therefore, typically:

    1. Establish model & initial parameter values based on theory
    2. Use maximum likelihood estimation to try to fit a proposed model to the data’s covariance matrix (may also use variable / factor indicator means)
    3. Review fit indices (e.g., χ2, SRMR, RMSEA, CFI, TLI, plus a few more) and modification indices
    4. Review final parameters values for insights
    5. Modify model / parameters & test against other theory-driven models

Fit Indices

  • Those emphasized for CFAs still apply:
Fit Index Which are robust against N & df? What are the criterion?
\(\chi\)2
SRMR
RMSEA
CFI
TLI

Fit Indices (cont.)

  • Those emphasized for CFAs still apply:
Fit Index Description Notes Criterion
\(\chi\)2 \(\circ\) Fundamental measure of model fit to data \(\circ\) Does not account for sample size (N)
\(\circ\) Does not account for model complexity (df)
> .05
SRMR \(\circ\) Mean difference between implied & actual covariance matrices \(\circ\) Does not account for N or df \(\le\) .08
RMSEA \(\circ\) Has known distribution \(\circ\) Accounts for N and df < .06 – .08
CFI \(\circ\) Model fit vs. null
\(\circ\) Ranges from 0 – 1
\(\circ\) Accounts for N
\(\circ\) Strongly accounts for df
\(\ge\) .95
TLI \(\circ\) Model fit vs. null
\(\circ\) Can fall outside of 0 – 1 range
\(\circ\) Accounts for N
\(\circ\) Moderately accounts for df
\(\ge\) .95

Additional Fit Indices

Additional Signs of a Poorly-Fitting Model

  • In addition to fit indices, the following are signs of a poorly-fitting model:
    • Negative residual variances
    • Very large standard errors
    • Many non-significant (free) parameters
    • Correlations > 1

Tests of Factor Loadings

  • Factor loadings are reported along with their standard errors
    • Together, these statistics can test the significance of that loading
    • By creating a critical ratio (CR, aka Wald test): \[CR=\frac{Factor\;Loading}{Standard\;Error}\]
    • Can then use:
      • CR ≥ ±1.96 (|z| = 2) for 2-tailed tests
      • CR ≥ ±1.65 for 1-tailed tests (viz., CR > 0)

Review of Model Parameters

  • Mainly testing change to model fit if a given parameter were either fixed, freed, or constrained
    • Fixed: Constrained to be a certain value (viz., 0 or 1)
    • Freed: Allowed to have its value estimated by the analyses
    • Constrained: Allowed to assume only a certain range of values (e.g., non-negative or larger than another parameter)
  • Usually, we test freeing parameters
    • Which is a less restrictive model
    • More guided by the data than by theory
    • So, yeah, quasi-exploratory

Review of Model Parameters (cont.)

  • Can also:
    • Allow factor indicators to load onto more than one factor (cross-loading)
    • Allow factor indicator error terms to correlate
    • Perhaps even remove problematic factor indicators

Comparing Models

Comparing Models: Modifying Parameters

  • Simplest comparisons are between models that vary only in parameter estimates
    • E.g., whether to include / remove a relationship between two factors, e.g.:

Modification Indices

  • Lagrange multiplier test
    • Computes minimum amount the χ2 (of the residual covariance matrix) would decrease if the given parameter were freed
    • Parameters with largest Lagrange multiplier values are most impactful on model
  • Still needs to be guided by theory
    • Especially since studies tend to find that models guided by Lagrange multiplier tests generalize poorly to other data

Comparing Models: Modifying Models

  • SEMs can test totally different arrangements of the factors
    • Or even different factor loadings
  • Through tests of overall model fits
    • Usually via χ2, AIC, & BIC

Comparing Models: Comparing Groups

  • SEMs can test group differences
    • E.g., whether the same relationships between factors holds for different groups
    • And thus can be sophisticated alternatives to, e.g., (M)AN(C)OVAs
  • We can test how well a model fits different groups by placing / removing equality constraints in the model
    • Adding / removing an equality constraints tests how the model performs when assuming the groups are similar / dissimilar on a given parameter
    • Therefore:
      1. Add an equality constraint, test model fit
      2. Remove an equality constraint, retest model
      3. Compare fits, e.g., via χ2 difference

Comparing Models: Comparing Groups (cont.)

  • Remember SEMs often include a series of linear regressions
    • These include intercepts for endogenous variables
    • We can compare whether these intercepts between groups
    • This will test the effect of group membership on the (other) model parameters
      • I.e., akin to adding a dummy variable for that group

Preparation for Conducting SEMs on APT Data

Review of Instruments

  • CHEAKS: Environmentally-related activities
    • Frequency of self-reported environmentally-relevant behaviors
    • Divided here into animal items:
      • “I have put up a bird house near my home”
      • “I have asked my parents not to buy products made from animal fur”
    • And all other behaviors, e.g.:
      • “I have talked with my parents about how to help with environmental problems”
      • “I turn off the water in the sink while I brush my teeth to conserve water”

Review of Instruments (cont.)

  • BES: Human-directed empathy
    • Self-reported empathy of other people
    • E.g.:
      • “Other people’s feeling don’t bother me at all”
      • “Seeing a person who has been angered has no effect on my feelings”
  • APT: Animal attitudes
    • Here divided by comparison stimuli
      • Human comparison (boy or girl)
      • Non-human comparison (food, plant, toy, activity)

Proposed Model

  • For demonstration, I propose this model:

Conducting SEMs in R
Out of the presentation mode and into the fire