CFA.knit

Confirmatory Factor Analysis

Overview

A Quick Review of Factor Analysis
- EFA vs. CFA
Conducting CFAs
- Accounting for Nominal & Ordinal Data
- Accounting for Non-Normal Data
Overview of Evaluating CFA Results
Testing CFA Model Fit
Estimating Sample Size in CFAs

A Quick Review of Factor Analysis

Factor analysis:
1. Uses the inter-relationships between ostensible items/indicators (aka “manifest” or “exogenous” variables”)
2. To infer the nature of non-ostensible factors (aka “latent constructs” or “endogenous variables”)
General procedure:
1. Use the correlation / covariance matrix
2. To infer linear equations of the relationships between the items
3. And then investigate those linear equations to understand factor structures

EFA vs. CFA

EFA use the data to infer a possible factor structure
- Is often done iteratively—and guided by theory—to compare structure structures
- But it is inherently data-driven
EFA is thus:
- Open to serendipity
- Vulnerable to chance

CFA tests a possible factor structure
- By seeing how well a proposed factor structure fits the data
- But can also be done iteratively to find better fits
- It is thus inherently theory-driven

EFA vs. CFA: Overall Procedure

EFA & CFA follow slightly different procedures

EFA	CFA
Compute factor loadings	Propose assignments of items to factors
“Eyeball” structure	Evaluate how well that model fits the data
3. Tweak structure by: • Changing number of factors included • Rotating the factors	3. Perhaps tweak model by changing item assignments or aspects of the model, e.g., • Whether factors interrelate ◦ Like rotating • How much items interrelate ◦ Like changing factor loadings

EFA vs. CFA: Overall Analyses

EFA & CFA use different statistics to evaluate their outcomes

EFA	CFA
Eigenvalues are the primary method to evaluate factor structure	Fit indices evaluate the overall factor structure
Factor loadings evaluate item clustering	Factor assignments evaluate item clustering & fit
Factor rotations evaluate relationships between factors	Assigned relationships evaluate relationships between factors

Uses of CFA

Factor structure
- Can evaluate a theory-driven assignment of items to factors
Relationships between factors
- Can evaluate whether factors are related
  - Similar to investigating orthogonality in EFA
- This can evaluate divergent / convergent evidence of validity

Uses of CFA (cont.)

Relative fit of different, proposed models
- In addition to “up or down” test of whether our model fits
- We can also compare how well different models fit the same data
  - E.g., to test different theories about the data

Uses of CFA (end)

Estimating reliability
- CFAs partition variance into shared & unique
- Shared covariance is akin to intraclass correlations (e.g., Cronbach’s α)
  - So can evaluate its magnitude to measure reliability
Detect some types of response bias
- Using common method bias that tests whether items that shouldn’t covary in fact do covary due to a common source of response bias

Conducting CFAs

Overview of CFA Process

Create an initial, proposed model that includes assignment of items/indicators to factors
Let the computer cast eldritch spells on that proposed model
Those spells will create a series of fit indices
Use those fit indices to evaluate how well the proposed model fits the actual data
Possibly tweak the model or compare it against other models on the same data

More Specific Steps to CFA

We define a measurement model that specifies how observed variables (items, indicators) are linked to latent constructs (factors)
- In addition to factor assignments, the model also includes parameters about relationships between factors & error terms
The computer translates this measurement model into an implied variance/covariance matrix based on the parameter estimates
The fit of the implied covariance matrix to the actual observed data is evaluated
- This is typically done using maximum likelihood estimation (MLE), which finds parameter values that jointly maximize the likelihood of observing the data given the model

More Specific Steps to CFA (cont.)

The parameter estimates in the implied covariance matrix are updated
- And the model is re-evaluated to assess its fit to the data
This process of updating parameters and re-evaluating the model is iterated until:
- The difference between the observed & implied covariance matrices falls below a pre-specified threshold, or
- A maximum number of iterations is reached
The result is the final implied covariance matrix
- This matrix represents the best attempt at reproducing the data from the proposed model

More Specific Steps to CFA (end)

The implied covariance matrix is compared to the actual observed covariance matrix to calculate a residual covariance matrix
- This residual matrix is comprised of the differences between the final implied covariance matrix & the observed (i.e., actual) covariance matrix
The residual covariance matrix is used to calculate fit indices etc. that evaluate the adequacy of the model
- Common fit indices include SRMR, RMSEA, CFI, and TLI
The fit indices and residuals are used to:
- Assess the overall fit of the model,
- Diagnose types of misfit, and
- Compare the fit of alternative models

Model Parameters

In the process of making the implied covariance matrix fit the observed data:
- The analysis must also estimate several parameters in the model
In addition to factor assignments, these include:
- Error variances: The variances of the measurement errors associated with each observed item
- Factor variances and covariances
  - The variances of the latent factors themselves
  - And the covariances between them

Model Parameters (cont.)

We can constrain some of these parameters to be certain values
- Usually constrained to be “1” or “0”
  - Constraining to “1” does not imply a correlation of 1
    - It simply sets a start value that tells the model, well, where to start its estimates
  - Constraining to “0”, however, does force those parameters to be unrelated
Constraining parameters:
- Creates fewer parameters to estimate
  - And thus can help the model be estimated
- Allows us to specify (& compare) more precise models

Model Parameters (cont.)

We also typically constrain covariances of items on different factors to “0”
- Item 1 & Item 3 to “0”
- Item 1 & Item 4 “0”, etc.
We also typically constrain error variances to be unrelated, etc.

Model Parameters (cont.)

Setting the initial covariance between Factor A & Factor B to “1” allows them to be non-orthogonal
- The final magnitude of their inter-relationship is determined through the iterative MLE procedure

Model Parameters (end)

The values in the final implied covariance matrix are strongly dependent on the initial parameters
- So must consider them carefully
- And generally thus use the “default” settings, i.e.:
  - Setting one item/indicator per factor to “1”
  - Setting inter-factor loadings to “0”
  - Setting covariance of items in different factors to “0”

CFAs &
Nominal/Ordinal Data

Handling Nominal & Ordinal Data

When data are categorical or ordinal data (e.g., when there are only 2 – 4 levels),
- Muthén (1994) and Finney & DiStefano (2013) suggest using categorical variable methodology (CVM)
- Especially using weighted least squares mean- and variance-adjusted (WLSMV) estimation
  - The WLSMV estimator doesn’t assume normally-distributed, continuous variables
  - And provides robust parameter estimates for ordinal data

Handling Data with Mixed Levels

When data are of mixed levels (e.g., some are ordinal, some continuous),
- Can use either WLSMV or robust maximum likelihood (MLR)
- WLSMV handles mixed data well
  - Using e.g., polychoric correlations for categorical indicators and Pearson’s r for continuous
- MLR is also effective
  - But assumes that the continuous variables are approximately normal
Nonetheless care should be taken since \(\chi^{2}\)s can be inflated

CFAs & Non-Normal Data

Satorra-Bentler Rescaled \(\chi^{2}\)

\(\chi^{2}\)-Based measures of model fit can break down quickly with non-normal data (Hu, Bentler, & Kano, 1992)
Curran et al. (1996) and others found good support for using the Satorra-Bentler (SB) rescaled \(\chi^{2}\)
- In these cases:
  - The SB rescaling adjusts \(\chi^{2}\) to account for multivariate kurtosis, residual variance, and degrees of freedom
  - Improves robustness by making misfit detection more accurate as data deviate from normality

Satorra-Bentler Rescaled \(\chi^{2}\) (cont.)

The rescaled SB \(\chi^{2}\) has performed better than other, comparable statistics to compensate for non-normality
- However, even it does not work well when kurtosis is rather large
  - Leading to losses of power (Foldnes, Henning, & Foss, 2012)
So, if the data are slightly non-normal, use rescaled SB \(\chi^{2}\)

Summary of CFAs & Non-Normal Data

However, there can be trouble if data are heavily non-normal
- Don’t assume normality or use CFAs without a deeper look at the data
- Instead first rescale the data
  - Or bootstrap distribution parameters
- N.b., this thus becomes more data-driven
More in Satorra & Bentler (1994)

Overview of Evaluating CFA Results

Key Results

Fit indices
- Evaluate how well proposed models fits the actual data
  - Usually use \(\chi^{2}\), SRMR, RMSEA, CFI, & TLI
- If good, then examine parameter estimates
  - If poor, then (often) examine modification indices
Parameter estimates
- Aspects of the measurement model
- Factor loadings, inter-factor correlations, etc.
Modification indices
- Offers clues about how to change model to bring it more in line with actual data

CFA and Internal Structure:
Modify Model and Re-Analyze

If fit indices were poor and if modification indices supplied reasonable clues
- Then can change measurement model, re-run analysis, examine new fit indices, etc.
Blurs distinction between confirmatory and exploratory analysis
- And may never identify a good model that fits data well

Testing CFA Model Fit

Model Fit Indices

There is no one measure of model fit that covers all relevant aspects
- Instead use a set of fit indices
- There are many, but a few are most commonly used
Discrepancy indices measure lack of fit of the model to the data
Relative fit indices measure fit of model against the fit of a null—or “independence”—model in which the factor indicators are assumed to be uncorrelated

Model Fit Indices (cont.)

Discrepancy Indices
- \(\chi^{2}\)
- Standardized root mean residual (SRMR)
- Root mean square error of approximation (RMSEA)
Relative Fit Indices
- Comparative fit index (CFI)
- Tucker-Lewis index (TLI)

Chi-Squared (\(\chi^{2}\))

Also called:
- Discrepancy function
- Likelihood ratio \(\chi^{2}\)
- \(\chi^{2}\) goodness of fit
Sometimes presented as \(\frac{\chi^{2}}{df}\)
- Normalizes values to account for model complexity
- \(\frac{\chi^{2}}{df}\) values > 3 suggest poor fit (i.e., want < 3)
  - But this is not an absolute
  - Anyway, making comparisons between models is more powerful & discriminating

\(\chi^{2}\) (cont.)

Tests the size of the residual covariance matrix
- Which, remember, is the difference between the implied covariance matrix and the observed covariance matrix
A larger residual covariance matrix denotes a poorer fit of the model to the data
- So, a larger \(\chi^{2}\) means a worse fit
- I.e., we want the \(\chi^{2}\) to be small
  - In fact, we want it to be non-significant

\(\chi^{2}\) (cont.)

\(\chi^{2}\) is not only used to test overall model fit
- It is also used to test relative fits of different models
  - In fact, this is one of its main uses in latent variable modeling
Note that it can only validly be used to compare models based on the same set of data
- Even subsetting the data invalidates the test

\(\chi^{2}\) (end)

N.b., like other significance tests, larger sample sizes make it easier to find significance
- To some (e.g., Hayduk et al., 2007), this is a boon
  - Allowing for greater sensitivity in model testing
  - And increasing the need for more exact models to account for the larger amount of information available
- But to most, this is nearly a liability
  - And other fit indices are used to complement \(\chi^{2}\)

Standardized Root Mean Residual (SRMR)

It is the mean residual from the residual covariance matrix
- So, like \(\chi^{2}\), SRMR is a measure of “badness of fit”
- However, a \(\chi^{2}\) is not computed in SRMR
  - It is the only widely-used fit index not based on the likelihood ratio \(\chi^{2}\)
    - Giving a somewhat unique perspective of model fit
  - Also found to be the most robust among these against “false positives” with increasing sample sizes
    (Hu & Benter, 1999; Shi et al., 2019)

SRMR (cont.)

Again, smaller values are better
- SRMR = 0 indicates a perfect fit of the implied matrix to the data
- SRMR \(\le\) .08 is a widely accepted criterion for good fit
Caveats
- SRMR tends to be artificially high for small Ns
- SRMR does not account well for model complexity
  - It does not consider the number of model parameters
  - Therefore, it is less effective when comparing models (Hu & Bentler, 1999 again)

SRMR (end)

Often reported along with \(\chi^{2}\), but more as a “raw” measure of fit
- But also report indices that try to account for:
  - Sample size (N)
  - Number of parameters in the proposed model (model complexity)

Root Mean Square Error of Approximation (RMSEA)

A modification of the simple \(\chi^{2}\) test
- Accounts for sample size and model complexity: \[\text{RMSEA}=\frac{\sqrt{\chi^{2}-df}}{\sqrt{df(N-1)}}\]
N.b., when df or N are small, RMSEA can be inaccurate
- Viz., looks better than it is (Kenny, Kaniskan, & McCoach, 2015)
- Presenting with confidence intervals helps address this

Comparative Fit Index (CFI)

Also known as Bentler’s Comparative Fit Index
Compares the fit of the proposed model to the data…
- Against the fit of an identity matrix
  - Where this null (independence) model assumes all of the factor indicators are uncorrelated
- Therefore represents the ratio between the discrepancy of the proposed model to the discrepancy of the independence model, roughly: \[\mathrm{CFI=\frac{(Misfit\;of\;Null\;Model)-(Misfit\;of\;Proposed\;Model)}{(Misfit\;of\;Null\;Model)}}\]
- I.e, as the misfit of the proposed model goes up,
  - CFI goes down

CFI (cont.)

Actual formula is:

\[\text{CFI} = 1 - \frac{(\chi^{2}-df)_{null}-(\chi^{2}-df)_{proposed}}{(\chi^{2}-df)_{null}}\]

So, also accounts for the complexity of the models by including dfs in the formula

Tucker-Lewis Index (TLI)

Like CFI, TLI compares the fit of the proposed model the null (identity matrix) model
However, TLI accounts for model complexity differently \[\text{TLI} = 1 - \frac{\left(\frac{\chi^2}{df}\right)_{\text{proposed}}}{\left(\frac{\chi^2}{df}\right)_{\text{null}}}\]
- TLI’s penality for model complexity differently often results in a somewhat smaller penalty for more complex models than CFI, especially when dfs are reduced (i.e., when additional parameters are estimated)
- I.e., TLI tends to favor simpler models than CFI

TLI (cont.)

Unlike CFI, TLI is not normed
- Its values can exceed 1 when the model fits exceptionally well relative to the null model
TLI is more sensitive to small sample sizes (N) than CFI
- This often results in lower TLI values with smaller N
- This sensitivity arises because TLI relies on \(\frac{\chi^2}{df}\),
  - Which can become unstable with small sample sizes, amplifying discrepancies

Further Notes on Relative Fit Indices

CFI and TLI are robust to larger sample sizes
- They are less affected by N than absolute fit indices like \(\chi^2\)
- (The N for the null and proposed models is the same, ensuring consistency)
CFI and TLI can overestimate model fit with small N
- When sample size is small, these indices may be artificially high,
  - Leading to an overestimation of how well the model fits the data

Further Notes on Relative Fit Indices (cont.)

Both CFI and TLI are affected by the average correlation among observed variables
- If correlations between indicators or parameters are low,
  - CFI and TLI will also tend to be low
- This is because both indices measure the degree to which the model explains the observed relationships
  - Which depends on the strength of those relationships

Summary of Model Fit Indices

Fit Index	Notes	Acceptance Criteria
\(\chi^{2}\)	Sensitive to sample size (N) Doesn’t directly account for model complexity (df)	p > .05 \(\frac{\chi^{2}}{df}\) < 3
SRMR	Doesn’t account for df Robust against large N Can be inflated with small N	\(\leq\) .08
RMSEA	Accounts for N and df, but sensitive to small N Has a known sampling distribution Can thus, e.g., compute confidence intervals	< .06 (“good”) .06 – .08 (“acceptable”)
CFI	Relatively robust to large N Strongly penalizes complexity via df Ranges from 0 to 1	\(\geq\) .95 (or .90) Ranges from 0 to 1
TLI	Accounts for N, but values can be unstable with small N Moderately penalizes complexity Can exceed 1 if the model fits exceptionally well	\(\geq\) .95 (or .90) Typically ranges from 0 to 1

Further Resources for CFA & Fit Indices

Bentler, P. (2007). On tests and indices for evaluating structural models. Personality and Individual Differences, 42(5), 825 – 829.
Hooper, D., Coughlan, J., & Mullen, M. R. (2008). Structural equation modeling: Guidelines for determining model fit. Electronic Journal of Business Research Methods, 6(1), 53-60.
Hoyle, R. H. (2022). Handbook of structural equation modeling (2nd ed.). London: Guilford Press.
- Great resource, including a fuller list of fit indices
Hu, L. & Bentler, P. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6(1), 1 – 55.
- Hu and Bentler provide suggested cut-off criteria for several indices—including those I cover here—that are often (incorrectly) cited as definitive
Kline, R. B. (2025). Principles and practice of structural equation modeling (5th ed.). New York, NY: Guilford Press.
Schreiber, J., Nora, A., Stage, F., Barlow, E., & King, J. (2006). Reporting structural equation modeling and confirmatory factor analysis results: A review. The Journal of Educational Research, 99(6), 323 – 338.

Estimating Sample Size in CFAs

Common Rules of Thumb

\(N \ge 200\)
- Where N = likely minimum required sample size
- Some support for this given by Jackson (2001)
\(N\ge 10k\)
- Where \(k\) = number of ostensible & non-ostensible variables

Common Rules of Thumb (cont.)

\(N \ge 25q\)
- Where \(q\) = number of model parameters
  - Usually computed as \(q=f \times(f-1)\) where \(f\) = number of factors
    - E.g., with 3 factors, \(q=3 \times (3-1) = 3 \times 2 = 6\)
- Some support for this method is given by Jackson (2003)

Common Rules of Thumb (cont.)

However, rules of thumb are of limited use in CFAs
- CFA can include so many different model structures
  - And one of the strongest influences on the needed sample size is how well the proposed model fits the data (Gagné & Hancock, 2006)
  - And how well—reliably—your instruments can measure it (Jackson, 2001)
- Furthermore, issues unique to a given set of data can make rules only weakly applicable

Common Rules of Thumb (end)

So, estimates of requisite sample sizes should be made very tentatively
- If you need strong, a priori predictions about requisite samples sizes,
  - Use a Monte Carlo study to run your proposed model through a series of simulated datasets (Muthén & Muthén, 2002; Myers, Ahn, & Jin, 2011)

Sample Size Statistics

Kaiser-Meyer-Olkin (KMO; Cerny & Kaiser, 1977)
- Measure of “sampling adequacy”
  - Proportion of item variances that may be “common”
  - ≥ .80 suggests sampling adequacy
Bartlett’s (1950) test of sphericity
- Significance here indicates that the variables may be suitable for factor analysis
  - Viz., whether a correlation matrix is significantly different than an identity matrix

Further Resources about Sample Size

Gagné, P., & Hancock, G. R. (2006). Measurement model quality, sample size, and solution propriety in confirmatory factor models. Multivariate Behavioral Research, 41, 65–83.
Jackson, D. L. (2001). Sample size and number of parameter estimates in maximum likelihood confirmatory factor analysis: A Monte Carlo investigation. Structural Equation Modeling: A Multidisciplinary Journal, 8(2), 205–223. doi: 10.1207/S15328007SEM0802_3
Jackson, D. L. (2003). Revisiting sample size and number of parameter estimates: Some support for the N:q hypothesis. Structural Equation Modeling: A Multidisciplinary Journal, 10, 128–141.
Muthén, L. K. & Muthén, B. O. (2002) How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling: A Multidisciplinary Journal, 9(4), 599-620, doi: 10.1207/S15328007SEM0904_8

A Quick Review of Factor Analysis

EFA vs. CFA

EFA vs. CFA: Overall Procedure

EFA vs. CFA: Overall Analyses

Uses of CFA

Uses of CFA (cont.)

Uses of CFA (end)

Overview of CFA Process

More Specific Steps to CFA

More Specific Steps to CFA (cont.)

More Specific Steps to CFA (end)

Model Parameters

Model Parameters (cont.)

Model Parameters (cont.)

Model Parameters (cont.)

Model Parameters (cont.)

Model Parameters (cont.)

Model Parameters (end)

Handling Nominal & Ordinal Data

Handling Data with Mixed Levels

Satorra-Bentler Rescaled \(\chi^{2}\)

Satorra-Bentler Rescaled \(\chi^{2}\) (cont.)

Summary of CFAs & Non-Normal Data

Key Results

CFA and Internal Structure:Modify Model and Re-Analyze

Model Fit Indices

Model Fit Indices (cont.)

Chi-Squared (\(\chi^{2}\))

\(\chi^{2}\) (cont.)

\(\chi^{2}\) (cont.)

\(\chi^{2}\) (end)

Standardized Root Mean Residual (SRMR)

SRMR (cont.)

SRMR (end)

Root Mean Square Error of Approximation (RMSEA)

Comparative Fit Index (CFI)

CFI (cont.)

Tucker-Lewis Index (TLI)

TLI (cont.)

Further Notes on Relative Fit Indices

Further Notes on Relative Fit Indices (cont.)

Summary of Model Fit Indices

Further Resources for CFA & Fit Indices

Common Rules of Thumb

Common Rules of Thumb (cont.)

Common Rules of Thumb (cont.)

Common Rules of Thumb (end)

Sample Size Statistics

Further Resources about Sample Size

CFA and Internal Structure:
Modify Model and Re-Analyze