Linear Regression Review
and
Testing Models Theoretically

Overview

  • Summary to Date
  • Review of Linear Regression
  • Testing Model Fit

Summary to Date

Descriptives vs. Inferentials

  • Descriptives good
    • Simple & intuitive
      • Can efficiently describe the sample
    • Robust
      • Because they make no assumptions about the population
  • Mean & SD
    • SD as average distance from the mean
    • SD as a standard unit of measurement
      • Standardized (z) scores
        • Why correlation is so popular
        • And covariance isn’t

Descriptives vs. Inferentials (cont.)

  • Inferentials
    • Make assumptions about the population
    • Most importantly the distribution
      • Often assume it approximates a normal distribution
        • But we know we’re wrong
      • Assumptions are often most robust against:
        • Kurtosis (& skew a bit)
        • Non-independence of measures (“multicollinearity”)
        • Changes in variance over time (“heteroscedasticity”)

Descriptives vs. Inferentials (cont.)

  • Inferentials (cont.)
    • Assumptions not robust against:
      • Non-independence of participants
      • Multi-modality (more than one “hump” in the distribution)
  • Sample stats approximating population stats
    • Accuracy of sample stats improves when:
      • Larger sample sizes
      • More representative sampling
      • Multiple “draws” of samples

Central Limit Theorum

  • “Multiple ‘draws’ of samples”
    • Sample stats never equal population stats
      • A sample stat assumed to always have some error to its measurement
        • Observed = True + Error
    • But! (assuming consistent sampling techniques)
      • The error of measurement of sample stats:
        • Tends to be normally distributed
        • Has a mean of zero (modulo bias)
      • This leads to the Central Limit Theorem
        • Which under-girds—even allows for—nearly every statistic you’ll use

Variance & Covariance

  • Variance = Information
  • We seek to understand that information
    • The more we understand, the better
    • Often quantify “how much we understand” as a signal-to-noise ratio
      • \(\text{Variance understood} = \frac{\text{Variance accounted for}}{\text{Variance }not \text{ accounted for}}\)
  • So, if “accounting” for the effect of one variable on an other:
    • \(\text{Variance understood} = \frac{\text{Covariance}}{\text{Unshared variance}}\)
    • Which is a correlation
      • (When it’s standardized)

Variance & Covariance (cont.)

  • Variance = Information
  • Seek to understand that information
    • The more we understand, the better (cont.)
  • And if we understand enough, we say we’ve made a “significant” insight
    • When is “enough” enough?
      • Usually when we’re 95% sure we’ve found enough
  • I.e., when we’re 95% sure that our sample stat…

     measures a population stat…

     that is different than the “null” value.

     (“Null” usually being “not different than zero,” “no effect,”
     “no difference,” “no information,” etc.)

Partialing Out Variance

  • Ways to increase the size of the signal to the size of the noise:
  • Increase size of signal
    • Bigger effects
    • Greater range of measurements of effects
  • Decrease size of noise
    • Greater precision of measurement
    • Remove the noise
  • Partial out the variance that is unshared between those variables
    • But is accounted for by some third variable

Ways of Communicating

  • Merging visuals with text
    • What is best described where
    • Visuals as “conversation pieces”
      • Text as highlighting what to focus on in visuals
      • Especially vis-à-vis theory
  • Efficiency & simplicity
    • “Information-to-ink ratio”
  • Strong organization
    • Clear guideposts & structure
  • Common—but not colloquial—language
    • Following writing conventions
    • Avoiding jargon and acronyms

Overview of What’s Next

  • Review of linear regression model
  • Partialing out variance
  • Combining similar sources of variance
  • Model fit
  • Ostensible & non-ostensible variables

Review of Linear Regression

Basic Strategy

  1. Assume the relationships between variables are linear
  2. Find a line that best accounts for all variables in the model
  3. If the line—the linear regression—accounts for enough of the total enough, declare significance of model
    • Measured, e.g., with R²
  4. And/or look at the individual predictors to see which of them are significant contributor to that linear regression
    • Amount of variance accounted for by those variables vs. total variance

Partialing Out Variance

  • At minimum, we separate out the variance associated with our predictor(s) from “error”
    • And, perhaps, the “intercept,” the starting, pre-intervention values for each participant

\[Y = b_{0} + b_{1}X_{1} + e\]

  • Adding other predictors separates out—partials out—the variance associated with each:

\[Y = b_{0} + b_{1}X_{1} + b_{2}X_{2} + e\]

Partialing Out Variance (cont.)

  • N.b., error can also be separated
    • E.g., if we know the sources of those errors (same hospital, neighborhood, etc.)

\[Y = b_{0} + b_{1}X_{1} + b_{2}X_{2} + e_{1} + e_{2}\]

  • We can also separate out error and effects of predictors over time
    • Assume that events will be more similar to each other at one time point
    • And that values within a person will tend to be more similar than between people
  • But more on that later.

For now, let’s focus on …

Combining Similar


Sources of Variance

Common Sources of Variance

  • Similar predictors may share too much variance
    • If left un-addressed, can lead to “multicollinearity
    • Which leads to unstable models terms
      • E.g., terms will flip from being significant to not & back depending on what other terms are added to the model
  • Usually addressed by removing one of the multicollinear terms
  • But we can also combine or group those variables…

Combining Sources of Variance (cont.)

  • We do this all the time, in fact
    • Adding up responses to items on the same survey
    • Taking the average of a results from a blood draw
  • But what if we know two variables are related, but not really easily combined?
    • E.g., ZIP code and salary
      • 10010 + $75,000 \(\ne\) 85,010

Combining Sources of Variance (end)

  • We can group them into “families” of variables within the model…

\[Y = b_{0} + b_{1}X_{1} + ( b_{ZIP}X_{ZIP} + b_{Salary}X_{Salary} ) + e \]

  • We can test this by looking at the difference in model fit:

 \(R^2_{\text{1st Model}} \ = b_{0} + b_{1}X_{1} \qquad \qquad \qquad \qquad \qquad \qquad \quad + e\) 

 \(R^2_{\text{2nd Model}} = b_{0} + b_{1}X_{1} + ( b_{ZIP}X_{ZIP} + b_{Salary}X_{Salary} ) + e\)

\(\text{Difference} = R^2_{\text{2nd Model}} - R^2_{\text{1st Model}}\)

A Brief Aside about Ostensible &


Non-Ostensible Variables

Ostensible & Non-Ostensible

  • Some Things We Can See…
    • Neighborhoods & paychecks
    • Blood pressure & adipose tissue
    • Smiles and cortisol levels
  • Some Things We Can’t See…
    • “Socio-economic status”
    • “Health”
    • “Stress”

Ostensible & Non-Ostensible (cont.)

  • Things we can observe empirically are sometimes called ostensible
  • While the underlying “construct” they are manifestations of are non-ostensible

Ostensible & Non-Ostensible (end)

  • It is quite common to use ostensible variables to represent non-ostensible constructs
    • Even if it is less common to realize (or least acknowledge) that is what one is doing
  • However, it is indeed important that the ostensible variable(s) well represent the construct of actual interest
  • This can be tested statistically (viz., psychometrically)
    • But is ultimately an issue that is decided theoretically

Model Testing

Comparing Model Fits

\(\text{Difference} = R^2_{\text{2nd Model}} - R^2_{\text{1st Model}}\)

  • If that difference is significant, then that “family” of variables in the second model significantly improves our understanding of the outcome
  • This is sometimes not done using R²,
    • But instead the information criterion
    • Determined with maximum likelihood estimation (MLE)
      • MLE requires a computer since it is computed iteratively (& math is less approachable)
      • But they are more robust than ordinary least squares
      • AIC & BIC are measures of misfit
        • So larger numbers are worse

Comparing Model Fits (cont.)

  • Common information criterion:
    • -2 \(\times\) log-likelihood (-2LL)
      • This is a “raw” information criterion from which others are derived
      • Usually not itself used
        • But is sometimes reported, so it’s good to know
    • Akaike Information Criterion (AIC)
    • Bayesian Information Criterion (BIC)
  • AIC & BIC both adjust for number of terms in the model
    • More than does R²—an other reason to use them
    • But BIC adjusts more aggressively
      • So use BIC when there are a lot of terms in the model

Comparing Model Fits (cont.)

  • Thus want to see that the model with the “family” of interest has a smaller information criterion
    • I.e.:

\[\text{Difference in Model Fit} =\]

\[ AIC_{\text{Model without Family}} - AIC_{\text{Model with Family}}\]

  • Information criteria can be large, so this may be, e.g.:

\[\text{Diff. in Model Fit} = 2020 - 2000 = 20\]

Model Testing (cont.)

  • We test this difference against a χ² with degrees of freedom equal to the difference in dfs between the models
  • E.g.:

\(\ \ \ \ b_{1}X_{1}\)

\(\ \ \ \ b_{1}X_{1} + ( b_{ZIP}X_{ZIP} + b_{Salary}X_{Salary} )\)

  • The second model as 2 more dfs than the first¹
    • So we test a χ² = 20 to see if it is significant
      • Against a χ² distribution based on 2 df
    • Which would be significant since the critical χ² \(\approx\) 10
      • (Since for a χ² dist., the mean = df and SD = 2df)

¹ ZIP could have more dfs, depending on how many ZIP codes are in that variable, but let’s assume it’s 1 dfs (thus two levels for ZIP, either dummy or just two ZIP codes.)

\(The\ End\)