Linear Regression Review
and
Testing Models Theoretically

Overview

  • Summary to Date
  • Review of linear regression model
  • Partialing out variance
  • Combining similar sources of variance
  • Ostensible & non-ostensible variables
  • Model fit

Summary to Date

Descriptives vs. Inferentials

  • Descriptives good
    • Simple & intuitive
      • Can efficiently describe the sample
    • Robust
      • Because they make no assumptions about the population
  • Mean & SD
    • SD as average distance from the mean
    • SD as a standard unit of measurement
      • Standardized (z) scores
        • Why correlation is so popular
        • And covariance isn’t

Descriptives vs. Inferentials (cont.)

  • Inferentials
    • Making assumptions about the population
    • Most importantly the distribution
      • Often assume approximates a normal distribution
        • But we know we’re wrong
      • Assumptions most robust against:
        • Kurtosis (& skew a bit)
        • Non-independence of measures (“multicollinearity”)
        • Changes in variance over time (“heteroscedasticity”)

Descriptives vs. Inferentials (end)

  • Inferentials (cont.)
    • Assumptions not robust against:
      • Non-independence of participants
      • Multi-modality (more than one “hump”)
  • Sample stats approximating population stats
    • Accuracy of sample stats improves when:
      • Larger sample sizes
      • More representative sampling
      • Multiple “draws” of samples

Central Limit Theorum

  • “Multiple ‘draws’ of samples”
    • Sample stats never equal population stats
      • A sample stat always has some error to its measurement
    • But! (assuming consistent sampling techniques)
      • The error of measurement of sample stats greatly tends to be normally distributed
      • This leads to the Central Limit Theorem
        • Which undergirds—allows for—nearly every statistic you’ll use
        • So remember that, even if you rarely think about it

Variance & Covariance

  • Variance = Information
  • Seek to understand that information
    • The more we understand, the better
    • Often quantify “how much we understand” as a signal-to-noise ratio
      • \(\text{Variance understood} = \frac{\text{Variance accounted for}}{\text{Variance }not \text{ accounted for}}\)
  • So, if “accounting” for the effect of one variable on an other:
    • \(\text{Variance understood} = \frac{\text{Covariance}}{\text{Unshared variance}}\)
    • Which is a correlation
      • (When it’s standardized)

Variance & Covariance (cont.)

  • Variance = Information
  • Seek to understand that information
    • The more we understand, the better (cont.)
  • And if we understand enough, we say we’ve made a “significant” insight
    • When is “enough” enough?
      • Usually when we’re 95% sure we’ve found enough
  • I.e., when we’re 95% sure that our sample stat…

     measures a population stat…

     that is different than the “null” value.

     (“Null” usually being “not different than zero,” “no effect,”
     “no difference,” “no information,” etc.)

Partialing Out Variance

  • Ways to increase the size of the signal to the size of the noise:
  • Increase size of signal
    • Bigger effects
    • Greater range of measurements of effects
  • Decrease size of noise
    • Greater precision of measurement
    • Remove the noise
  • Partial out the variance that is unshared between those variables
    • But is accounted for by some third variable

Ways of Communicating

  • Merging visuals with text
    • What is best described where
    • Visuals as “conversation pieces”
      • Text as highlighting what to focus on in visuals
      • Especially vis-à-vis theory
  • Efficiency & simplicity
    • “Information-to-ink ratio”
  • Strong organization
    • Clear guideposts & structure
  • Common—but not colloquial—language
    • Following writing conventions
    • Avoiding jargon and acronyms

Review of Linear Regression

Basic Strategy

  1. Assume the relationships between variables are linear
  2. Find a line that best accounts for all variables in the model
    • Either via “ordinary least squares”
    • Or “maximum likelihood”
  3. If the line—the linear regression—accounts for enough of the total enough, declare significance of model
    • Measure, e.g., with R2
  4. And/or look at some/all of the included predictors to see which of them are significant contributor to that linear regression
    • Amount of variance accounted for by that variables vs. total variance

Partialing Out Variance

  • At minimum, we separate out the variance associated with our predictor(s) from “error”
    • And, perhaps, the “intercept,” the starting, pre-intervention values for each participant

\[Y = b_{0} + b_{1}X_{1} + e\]

  • Adding other predictors separates out—partials out—the variance associated with each:

\[Y = b_{0} + b_{1}X_{1} + b_{2}X_{2} + e\]

Partialing Out Variance

  • Error can also be separated
    • E.g., if we know the sources of those errors (same hospital, neighborhood, etc.)

\[Y = b_{0} + b_{1}X_{1} + b_{2}X_{2} + e_{1} + e_{2}\]

  • We can also separate out error and effects of predictors over time
    • Assume that events will be more similar to each other at one time point
    • And that values within a person will tend to be more similar than between people
  • Different ways to do this,

Combining Similar


Sources of Variance

Common Sources of Variance

  • Similar predictors may share too much variance
    • If left un-addressed, can lead to “multicollinearity
    • Which leads to unstable models terms
      • E.g., terms will flip from being significant to not & back depending on what other terms are added to the model
  • Usually addressed by removing one of the multicollinear terms
  • But we can also combine or group those variables…

Combining Sources of Variance (cont.)

  • We do this all the time, in fact
    • Adding up responses to items on the same survey
    • Taking the average of a results from a blood draw
  • But what if we know two variables are related, but not really easily combined?
    • E.g., ZIP code and salary
      • 10010 + $75,000 \(\ne\) 85,010

Model Fit

  • We can group them into “families” of variables within the model…

\[Y = b_{0} + b_{1}X_{1} + ( b_{ZIP}X_{ZIP} + b_{Salary}X_{Salary} ) + e \]

  • We can test this by looking at the difference in model fit:

\[R^2_{FirstModel} = b_{0} + b_{1}X_{1} + e \]

\[R^2_{Second Model} = b_{0} + b_{1}X_{1} + ( b_{ZIP}X_{ZIP} + b_{Salary}X_{Salary} ) + e \]

\[\text{Difference} = R^2_{FirstModel} - R^2_{Second Model}\]

  • If that difference is significant, then that “family” of variables significantly improves our understanding of the outcome

Ostensible &


Non-Ostensible Variables

Ostensible & Non-Ostensible

  • Some Things We Can See…
    • Neighborhoods & paychecks
    • Blood pressure & adipose tissue
    • Smiles and cortisol levels
  • Some Things We Can’t See…
    • “Socio-economic status”
    • “Health”
    • “Stress”

Ostensible & Non-Ostensible (cont.)

  • Things we can observe empirically are sometimes called ostensible
  • While the underlying “construct” they are manifestations of are non-ostensible

Model Testing

ZIP & Salary Example (cont.)

  • This is usually done not using R2,
    • But instead the information criterion
    • Determined fro maximum likelihood estimations (MLEs)
      • So need to use them
      • But they are more robust than ordinary least squares
      • And ordinary least squares are MLEs when all assumptions are met
  • Common information criterion:
    • Akaike Information Criterion (AIC)
    • Bayesian Information Criterion (BIC)
    • Both adjust for number of terms in the model
      • But BIC adjusts more aggressively
      • So use BIC when there are a lot of terms in the model

Model Testing (cont.)

  • AIC & BIC are measures of misfit
    • So larger numbers are worse
  • Thus want to see that the model with the “family” of interest has a smaller information criterion
    • I.e.:

\[\text{Difference in Model Fit} =\]

\[ AIC_{\text{Model without Family}} - AIC_{\text{Model with Family}}\]

  • Information Criteria can be large, so this may be, e.g.:

\[\text{Diff. in Model Fit} = 2010 - 2000 = 10\]

Model Testing (cont.)

  • We test this difference against a χ2 with degrees of freedom equal to the difference in dfs between the models
  • E.g.:

\(b_{1}X_{1}\)

\(b_{1}X_{1} + ( b_{ZIP}X_{ZIP} + b_{Salary}X_{Salary} )\)

  • The second model as 2 more dfs than the first
    • So text χ2 = 10 with df = 2
    • (Which would be significant; critical χ2 \(\approx\) 6 for one-tailed tests)

\(The\ End\)