Confirmatory Factor Analysis

Overview

  • A Quick Review of Factor Analysis
    • EFA vs. CFA
  • Conducting CFAs
    • Accounting for Nominal & Ordinal Data
    • Accounting for Non-Normal Data
  • Overview of Evaluating CFA Results
  • Testing CFA Model Fit
  • Estimating Sample Size in CFAs

A Quick Review of Factor Analysis

  • Factor analysis:
    1. Uses the inter-relationships between ostensible items/indicators (aka “exogenous variables”)
    2. To infer the nature of non-ostensible factors (aka “endogenous variables”)
  • General procedure:
    1. Use the correlation / covariance matrix
    2. To infer linear equations of the items
    3. And then investigate those linear equations to understand the factor structure

EFA vs. CFA

  • EFA use the data to infer a possible factor structure
    • Is often done iteratively—and guided by theory—to compare structure structures
    • But it is inherently data-driven
  • EFA is thus:
    • Open to serendipity
    • Vulnerable to chance
  • CFA tests a possible factor structure
    • By seeing how well a proposed factor structure fits the data
    • But can also be done iteratively to find better fits
    • It is thus inherently theory-driven

EFA vs. CFA: Overall Procedure

  • EFA & CFA follow slightly different procedures
EFA CFA
  1. Compute factor loadings
  1. Propose assignments of items to factors
  1. “Eyeball” structure
  1. Evaluate how well that model fits the data
3. Tweak structure by:
  • Changing number of factors included
  • Rotating the factors
3. Perhaps tweak model by changing item assignments or aspects of the model, e.g.,
  • Whether factors interrelate
     ○ Like rotating
  • How much items interrelate
     ○ Like changing factor loadings

EFA vs. CFA: Overall Analyses

  • EFA & CFA use different statistics to evaluate their outcomes
EFA CFA
Eigenvalues are the primary method to evaluate factor structure Fit indices evaluate the overall factor structure
Factor loadings evaluate item clustering Factor assignments evaluate item clustering & fit
Factor rotations evaluate relationships between factors Assigned relationships evaluate relationships between factors

Uses of CFA

  • Factor structure
    • Can evaluate a theory-driven assignment of items to factors
  • Relationships between factors
    • Can evaluate whether factors are related
      • Similar to investigating orthogonality in EFA
    • This can evaluate divergent / convergent evidence of validity

Uses of CFA (cont.)

  • Relative fit of different, proposed models
    • In addition to “up or down” test of whether our model fits
    • We can also compare how well different models fit the same data
      • E.g., to test different theories about the data

Uses of CFA (end)

  • Estimating reliability
    • CFAs partition variance into shared & unique
    • Shared covariance is akin to intraclass correlations (e.g., Cronbach’s α)
      • So can evaluate its magnitude to measure reliability
  • Detect some types of response bias
    • Using common method bias that tests whether items that shouldn’t covary in fact do covary due to a common source of response bias

Conducting CFAs

Overview of CFA Process

  1. Create an initial, proposed model that includes assignment of items/indicators to factors
  2. Let the computer cast eldritch spells on that proposed model
  3. Those spells will create a series of fit indices
  4. Use those fit indices to evaluate how well the proposed model fits the actual data
  5. Possibly tweak the model or compare it against other models on the same data

More Specific Steps to CFA

  1. We define a measurement model that specifies how observed variables (items, indicators) are linked to latent constructs (factors)
    • In addition to factor assignments, the model also includes parameters about relationships between factors & error terms
  2. The computer translates this measurement model into an implied variance/covariance matrix based on the parameter estimates
  3. The fit of the implied covariance matrix to the actual observed data is evaluated
    • This is typically done using maximum likelihood estimation (MLE), which finds parameter values that jointly maximize the likelihood of observing the data given the model

More Specific Steps to CFA (cont.)

  1. The parameter estimates in the implied covariance matrix are updated
    • And the model is re-evaluated to assess its fit to the data
  2. This process of updating parameters and re-evaluating the model is iterated until:
    • The difference between the observed & implied covariance matrices falls below a pre-specified threshold, or
    • A maximum number of iterations is reached
  3. The result is the final implied covariance matrix
    • This matrix represents the best attempt at reproducing the data from the proposed model

More Specific Steps to CFA (end)

  1. The implied covariance matrix is compared to the actual observed covariance matrix to calculate a residual covariance matrix
    • This residual matrix is comprised of the differences between the final implied covariance matrix & the observed (i.e., actual) covariance matrix
  2. The residual covariance matrix is used to calculate fit indices etc. that evaluate the adequacy of the model
    • Common fit indices include SRMR, RMSEA, CFI, and TLI
  3. The fit indices and residuals are used to:
    • Assess the overall fit of the model,
    • Diagnose types of misfit, and
    • Compare the fit of alternative models

Model Parameters

  • In the process of making the implied covariance matrix fit the observed data:
    • The analysis must also estimate several parameters in the model
  • In addition to factor assignments, these include:
    • Error variances: The variances of the measurement errors associated with each observed item
    • Factor variances and covariances
      • The variances of the latent factors themselves
      • And the covariances between them

Model Parameters (cont.)

Model Parameters (cont.)

Model Parameters (cont.)

  • We can constrain some of these parameters to be certain values
    • Usually constrained to be “1” or “0”
      • Constraining to “1” does not imply a correlation of 1
        • It simply sets a start value that tells the model, well, where to start its estimates
      • Constraining to “0”, however, does force those parameters to be unrelated
  • Constraining parameters:
    • Creates fewer parameters to estimate
      • And thus can help the model be estimated
    • Allows us to specify (& compare) more precise models

Model Parameters (cont.)

  • We also typically constrain covariances of items on different factors to “0”
    • Item 1 & Item 3 to “0”
    • Item 1 & Item 4 “0”, etc.
  • We also typically constrain error variances to be unrelated, etc.

Model Parameters (cont.)

  • Setting the initial covariance between Factor A & Factor B to “1” allows them to be non-orthogonal
    • The final magnitude of their inter-relationship is determined through the iterative MLE procedure

Model Parameters (end)

  • The values in the final implied covariance matrix are strongly dependent on the initial parameters
    • So must consider them carefully
    • And generally thus use the “default” settings, i.e.:
      • Setting one item/indicator per factor to “1”
      • Setting inter-factor loadings to “0”
      • Setting covariance of items in different factors to “0”

CFAs &
Nominal/Ordinal Data

Handling Nominal & Ordinal Data

  • When data are categorical or ordinal data (e.g., when there are only 2 – 4 levels),
    • Muthén (1994) and Finney & DiStefano (2013) suggest using categorical variable methodology (CVM)
    • Especially using weighted least squares mean- and variance-adjusted (WLSMV) estimation
      • The WLSMV estimator doesn’t assume normally-distributed, continuous variables
      • And provides robust parameter estimates for ordinal data

Handling Data with Mixed Levels

  • When data are of mixed levels (e.g., some are ordinal, some continuous),
    • Can use either WLSMV or robust maximum likelihood (MLR)
    • WLSMV handles mixed data well
      • Using e.g., polychoric correlations for categorical indicators and Pearson’s r for continuous
    • MLR is also effective
      • But assumes that the continuous variables are approximately normal
  • Nonetheless care should be taken since \(\chi^{2}\)s can be inflated

CFAs & Non-Normal Data

Satorra-Bentler Rescaled \(\chi^{2}\)

  • \(\chi^{2}\)-Based measures of model fit can break down quickly with non-normal data (Hu, Bentler, & Kano, 1992)
  • Curran et al. (1996) and others found good support for using the Satorra-Bentler (SB) rescaled \(\chi^{2}\)
    • In these cases:
      • The SB rescaling adjusts \(\chi^{2}\) to account for multivariate kurtosis, residual variance, and degrees of freedom
      • Improves robustness by making misfit detection more accurate as data deviate from normality

Satorra-Bentler Rescaled \(\chi^{2}\) (cont.)

  • The rescaled SB \(\chi^{2}\) has performed better than other, comparable statistics to compensate for non-normality
  • So, if the data are slightly non-normal, use rescaled SB \(\chi^{2}\)

Summary of CFAs & Non-Normal Data

  • However, there can be trouble if data are heavily non-normal
    • Don’t assume normality or use CFAs without a deeper look at the data
    • Instead first rescale the data
      • Or bootstrap distribution parameters
    • N.b., this thus becomes more data-driven
  • More in Satorra & Bentler (1994)

Overview of Evaluating CFA Results

Key Results

  1. Fit indices
    • Evaluate how well proposed models fits the actual data
      • Usually use \(\chi^{2}\), SRMR, RMSEA, CFI, & TLI
    • If good, then examine parameter estimates
      • If poor, then (often) examine modification indices
  2. Parameter estimates
    • Aspects of the measurement model
    • Factor loadings, inter-factor correlations, etc.
  3. Modification indices
    • Offers clues about how to change model to bring it more in line with actual data

CFA and Internal Structure:
Modify Model and Re-Analyze

  • If fit indices were poor and if modification indices supplied reasonable clues
    • Then can change measurement model, re-run analysis, examine new fit indices, etc.
  • Blurs distinction between confirmatory and exploratory analysis
    • And may never identify a good model that fits data well

Testing CFA Model Fit

Model Fit Indices

  • There is no one measure of model fit that covers all relevant aspects
    • Instead use a set of fit indices
    • There are many, but a few are most commonly used
  • Discrepancy indices measure lack of fit of the model to the data
  • Relative fit indices measure fit of model against the fit of a null—or “independence”—model in which the factor indicators are assumed to be uncorrelated

Model Fit Indices (cont.)

  • Discrepancy Indices
    • \(\chi^{2}\)
    • Standardized root mean residual (SRMR)
    • Root mean square error of approximation (RMSEA)
  • Relative Fit Indices
    • Comparative fit index (CFI)
    • Tucker-Lewis index (TLI)

Chi-Squared (\(\chi^{2}\))

  • Also called:
    • Discrepancy function
    • Likelihood ratio \(\chi^{2}\)
    • \(\chi^{2}\) goodness of fit
  • Sometimes presented as \(\frac{\chi^{2}}{df}\)
    • Normalizes values to account for model complexity
    • \(\frac{\chi^{2}}{df}\) values > 3 suggest poor fit (i.e., want < 3)
      • But this is not an absolute
      • Anyway, making comparisons between models is more powerful & discriminating

\(\chi^{2}\) (cont.)

  • Tests the size of the residual covariance matrix
    • Which, remember, is the difference between the implied covariance matrix and the observed covariance matrix
  • A larger residual covariance matrix denotes a poorer fit of the model to the data
    • So, a larger \(\chi^{2}\) means a worse fit
    • I.e., we want the \(\chi^{2}\) to be small
      • In fact, we want it to be non-significant

\(\chi^{2}\) (cont.)

  • \(\chi^{2}\) is not only used to test overall model fit
    • It is also used to test relative fits of different models
      • In fact, this is one of its main uses in latent variable modeling
  • Note that it can only validly be used to compare models based on the same set of data
    • Even subsetting the data invalidates the test

\(\chi^{2}\) (end)

  • N.b., like other significance tests, larger sample sizes make it easier to find significance
    • To some (e.g., Hayduk et al., 2007), this is a boon
      • Allowing for greater sensitivity in model testing
      • And increasing the need for more exact models to account for the larger amount of information available
    • But to most, this is nearly a liability
      • And other fit indices are used to complement \(\chi^{2}\)

Standardized Root Mean Residual (SRMR)

  • It is the mean residual from the residual covariance matrix
    • So, like \(\chi^{2}\), SRMR is a measure of “badness of fit”
    • However, a \(\chi^{2}\) is not computed in SRMR
      • It is the only widely-used fit index not based on the likelihood ratio \(\chi^{2}\)
        • Giving a somewhat unique perspective of model fit
      • Also found to be the most robust among these against “false positives” with increasing sample sizes
        (Hu & Benter, 1999; Shi et al., 2019)

SRMR (cont.)

  • Again, smaller values are better
    • SRMR = 0 indicates a perfect fit of the implied matrix to the data
    • SRMR \(\le\) .08 is a widely accepted criterion for good fit
  • Caveats
    • SRMR tends to be artificially high for small Ns
    • SRMR does not account well for model complexity
      • It does not consider the number of model parameters
      • Therefore, it is less effective when comparing models (Hu & Bentler, 1999 again)

SRMR (end)

  • Often reported along with \(\chi^{2}\), but more as a “raw” measure of fit
    • But also report indices that try to account for:
      • Sample size (N)
      • Number of parameters in the proposed model (model complexity)

Root Mean Square Error of Approximation (RMSEA)

Comparative Fit Index (CFI)

  • Also known as Bentler’s Comparative Fit Index
  • Compares the fit of the proposed model to the data…
    • Against the fit of an identity matrix
      • Where this null (independence) model assumes all of the factor indicators are uncorrelated
    • Therefore represents the ratio between the discrepancy of the proposed model to the discrepancy of the independence model, roughly: \[\mathrm{CFI=\frac{(Misfit\;of\;Null\;Model)-(Misfit\;of\;Proposed\;Model)}{(Misfit\;of\;Null\;Model)}}\]
    • I.e, as the misfit of the proposed model goes up,
      • CFI goes down

CFI (cont.)

  • Actual formula is:

\[\text{CFI} = 1 - \frac{(\chi^{2}-df)_{null}-(\chi^{2}-df)_{proposed}}{(\chi^{2}-df)_{null}}\]

  • So, also accounts for the complexity of the models by including dfs in the formula

Tucker-Lewis Index (TLI)

  • Like CFI, TLI compares the fit of the proposed model the null (identity matrix) model
  • However, TLI accounts for model complexity differently \[\text{TLI} = 1 - \frac{\left(\frac{\chi^2}{df}\right)_{\text{proposed}}}{\left(\frac{\chi^2}{df}\right)_{\text{null}}}\]
    • TLI’s penality for model complexity differently often results in a somewhat smaller penalty for more complex models than CFI, especially when dfs are reduced (i.e., when additional parameters are estimated)
    • I.e., TLI tends to favor simpler models than CFI

TLI (cont.)

  • Unlike CFI, TLI is not normed
    • Its values can exceed 1 when the model fits exceptionally well relative to the null model
  • TLI is more sensitive to small sample sizes (N) than CFI
    • This often results in lower TLI values with smaller N
    • This sensitivity arises because TLI relies on \(\frac{\chi^2}{df}\),
      • Which can become unstable with small sample sizes, amplifying discrepancies

Further Notes on Relative Fit Indices

  • CFI and TLI are robust to larger sample sizes
    • They are less affected by N than absolute fit indices like \(\chi^2\)
    • (The N for the null and proposed models is the same, ensuring consistency)
  • CFI and TLI can overestimate model fit with small N
    • When sample size is small, these indices may be artificially high,
      • Leading to an overestimation of how well the model fits the data

Further Notes on Relative Fit Indices (cont.)

  • Both CFI and TLI are affected by the average correlation among observed variables
    • If correlations between indicators or parameters are low,
      • CFI and TLI will also tend to be low
    • This is because both indices measure the degree to which the model explains the observed relationships
      • Which depends on the strength of those relationships

Summary of Model Fit Indices

Fit Index Notes Acceptance Criteria
\(\chi^{2}\)
  • Sensitive to sample size (N)
  • Doesn’t directly account for model complexity (df)
  • p > .05
  • \(\frac{\chi^{2}}{df}\) < 3
SRMR
  • Doesn’t account for df
  • Robust against large N
  • Can be inflated with small N
\(\leq\) .08
RMSEA
  • Accounts for N and df, but sensitive to small N
  • Has a known sampling distribution
    • Can thus, e.g., compute confidence intervals
CFI
  • Relatively robust to large N
  • Strongly penalizes complexity via df
  • Ranges from 0 to 1
TLI
  • Accounts for N, but values can be unstable with small N
  • Moderately penalizes complexity
  • Can exceed 1 if the model fits exceptionally well

Further Resources for CFA & Fit Indices

  • Bentler, P. (2007). On tests and indices for evaluating structural models. Personality and Individual Differences, 42(5), 825 – 829.
  • Hooper, D., Coughlan, J., & Mullen, M. R. (2008). Structural equation modeling: Guidelines for determining model fit. Electronic Journal of Business Research Methods, 6(1), 53-60.
  • Hoyle, R. H. (2022). Handbook of structural equation modeling (2nd ed.). London: Guilford Press.
  • Hu, L. & Bentler, P. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6(1), 1 – 55.
    • Hu and Bentler provide suggested cut-off criteria for several indices—including those I cover here—that are often (incorrectly) cited as definitive
  • Kline, R. B. (2025). Principles and practice of structural equation modeling (5th ed.). New York, NY: Guilford Press.
  • Schreiber, J., Nora, A., Stage, F., Barlow, E., & King, J. (2006). Reporting structural equation modeling and confirmatory factor analysis results: A review. The Journal of Educational Research, 99(6), 323 – 338.

Estimating Sample Size in CFAs

Common Rules of Thumb

  • \(N \ge 200\)
    • Where N = likely minimum required sample size
    • Some support for this given by Jackson (2001)
  • \(N\ge 10k\)
    • Where \(k\) = number of ostensible & non-ostensible variables

Common Rules of Thumb (cont.)

  • \(N \ge 25q\)
    • Where \(q\) = number of model parameters
      • Usually computed as \(q=f \times(f-1)\) where \(f\) = number of factors
        • E.g., with 3 factors, \(q=3 \times (3-1) = 3 \times 2 = 6\)
    • Some support for this method is given by Jackson (2003)

Common Rules of Thumb (cont.)

  • However, rules of thumb are of limited use in CFAs
    • CFA can include so many different model structures
      • And one of the strongest influences on the needed sample size is how well the proposed model fits the data (Gagné & Hancock, 2006)
      • And how well—reliably—your instruments can measure it (Jackson, 2001)
    • Furthermore, issues unique to a given set of data can make rules only weakly applicable

Common Rules of Thumb (end)

  • So, estimates of requisite sample sizes should be made very tentatively

Sample Size Statistics

  • Kaiser-Meyer-Olkin (KMO; Cerny & Kaiser, 1977)
    • Measure of “sampling adequacy”
      • Proportion of item variances that may be “common”
      • ≥ .80 suggests sampling adequacy
  • Bartlett’s (1950) test of sphericity
    • Significance here indicates that the variables may be suitable for factor analysis
      • Viz., whether a correlation matrix is significantly different than an identity matrix

Further Resources about Sample Size

  • Gagné, P., & Hancock, G. R. (2006). Measurement model quality, sample size, and solution propriety in confirmatory factor models. Multivariate Behavioral Research, 41, 65–83.
  • Jackson, D. L. (2001). Sample size and number of parameter estimates in maximum likelihood confirmatory factor analysis: A Monte Carlo investigation. Structural Equation Modeling: A Multidisciplinary Journal, 8(2), 205–223. doi: 10.1207/S15328007SEM0802_3
  • Jackson, D. L. (2003). Revisiting sample size and number of parameter estimates: Some support for the N:q hypothesis. Structural Equation Modeling: A Multidisciplinary Journal, 10, 128–141.
  • Muthén, L. K. & Muthén, B. O. (2002) How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling: A Multidisciplinary Journal, 9(4), 599-620, doi: 10.1207/S15328007SEM0904_8