Linear Regression Review
and
Testing Models Theoretically

Overview

Summary to Date
Review of Linear Regression
Testing Model Fit

Summary to Date

Descriptives vs. Inferentials

Descriptives good
- Simple & intuitive
  - Can efficiently describe the sample
- Robust
  - Because they make no assumptions about the population
Mean & SD
- SD as average distance from the mean
- SD as a standard unit of measurement
  - Standardized (z) scores
    - Why correlation is so popular
    - And covariance isn’t

Descriptives vs. Inferentials (cont.)

Inferentials
- Make assumptions about the population
- Most importantly the distribution
  - Often assume it approximates a normal distribution
    - But we know we’re wrong
  - Assumptions are often most robust against:
    - Kurtosis (& skew a bit)
    - Non-independence of measures (“multicollinearity”)
    - Changes in variance over time (“heteroscedasticity”)

Descriptives vs. Inferentials (cont.)

Inferentials (cont.)
- Assumptions not robust against:
  - Non-independence of participants
  - Multi-modality (more than one “hump” in the distribution)

Sample stats approximating population stats
- Accuracy of sample stats improves when:
  - Larger sample sizes
  - More representative sampling
  - Multiple “draws” of samples

Central Limit Theorum

“Multiple ‘draws’ of samples”
- Sample stats never equal population stats
  - A sample stat assumed to always have some error to its measurement
    - Observed = True + Error
- But! (assuming consistent sampling techniques)
  - The error of measurement of sample stats:
    - Tends to be normally distributed
    - Has a mean of zero (modulo bias)
  - This leads to the Central Limit Theorem
    - Which under-girds—even allows for—nearly every statistic you’ll use

Variance & Covariance

Variance = Information
We seek to understand that information
- The more we understand, the better
- Often quantify “how much we understand” as a signal-to-noise ratio
  - $\text{Variance understood} = \frac{\text{Variance accounted for}}{\text{Variance }not \text{ accounted for}}$

So, if “accounting” for the effect of one variable on an other:
- $\text{Variance understood} = \frac{\text{Covariance}}{\text{Unshared variance}}$
- Which is a correlation
  - (When it’s standardized)

Variance & Covariance (cont.)

Variance = Information
Seek to understand that information
- The more we understand, the better (cont.)

And if we understand enough, we say we’ve made a “significant” insight
- When is “enough” enough?
  - Usually when we’re 95% sure we’ve found enough

I.e., when we’re 95% sure that our sample stat…

measures a population stat…

that is different than the “null” value.

(“Null” usually being “not different than zero,” “no effect,”
“no difference,” “no information,” etc.)

Partialing Out Variance

Ways to increase the size of the signal to the size of the noise:
Increase size of signal
- Bigger effects
- Greater range of measurements of effects

Decrease size of noise
- Greater precision of measurement
- Remove the noise

Partial out the variance that is unshared between those variables
- But is accounted for by some third variable

Ways of Communicating

Merging visuals with text
- What is best described where
- Visuals as “conversation pieces”
  - Text as highlighting what to focus on in visuals
  - Especially vis-à-vis theory

Efficiency & simplicity
- “Information-to-ink ratio”

Strong organization
- Clear guideposts & structure

Common—but not colloquial—language
- Following writing conventions
- Avoiding jargon and acronyms

Overview of What’s Next

Review of linear regression model
Partialing out variance
Combining similar sources of variance
Model fit
Ostensible & non-ostensible variables

Review of Linear Regression

Basic Strategy

Assume the relationships between variables are linear
Find a line that best accounts for all variables in the model
- Either via ordinary least squares (OLS)
- Or maximum likelihood estimation (MLE)
If the line—the linear regression—accounts for enough of the total enough, declare significance of model
- Measured, e.g., with R²
And/or look at the individual predictors to see which of them are significant contributor to that linear regression
- Amount of variance accounted for by those variables vs. total variance

Partialing Out Variance

At minimum, we separate out the variance associated with our predictor(s) from “error”
- And, perhaps, the “intercept,” the starting, pre-intervention values for each participant

\[Y = b_{0} + b_{1}X_{1} + e\]

Adding other predictors separates out—partials out—the variance associated with each:

\[Y = b_{0} + b_{1}X_{1} + b_{2}X_{2} + e\]

Partialing Out Variance (cont.)

N.b., error can also be separated
- E.g., if we know the sources of those errors (same hospital, neighborhood, etc.)

\[Y = b_{0} + b_{1}X_{1} + b_{2}X_{2} + e_{1} + e_{2}\]

We can also separate out error and effects of predictors over time
- Assume that events will be more similar to each other at one time point
- And that values within a person will tend to be more similar than between people
But more on that later.

For now, let’s focus on …

Combining Similar

Sources of Variance

Common Sources of Variance

Similar predictors may share too much variance
- If left un-addressed, can lead to “multicollinearity”
- Which leads to unstable models terms
  - E.g., terms will flip from being significant to not & back depending on what other terms are added to the model
Usually addressed by removing one of the multicollinear terms

But we can also combine or group those variables…

Combining Sources of Variance (cont.)

We do this all the time, in fact
- Adding up responses to items on the same survey
- Taking the average of a results from a blood draw

But what if we know two variables are related, but not really easily combined?
- E.g., ZIP code and salary
  - 10010 + $75,000 $\ne$ 85,010

Combining Sources of Variance (end)

We can group them into “families” of variables within the model…

\[Y = b_{0} + b_{1}X_{1} + ( b_{ZIP}X_{ZIP} + b_{Salary}X_{Salary} ) + e \]

We can test this by looking at the difference in model fit:

$R^2_{\text{1st Model}} \ = b_{0} + b_{1}X_{1} \qquad \qquad \qquad \qquad \qquad \qquad \quad + e$

$R^2_{\text{2nd Model}} = b_{0} + b_{1}X_{1} + ( b_{ZIP}X_{ZIP} + b_{Salary}X_{Salary} ) + e$

$\text{Difference} = R^2_{\text{2nd Model}} - R^2_{\text{1st Model}}$

A Brief Aside about Ostensible &

Non-Ostensible Variables

Ostensible & Non-Ostensible

Some Things We Can See…
- Neighborhoods & paychecks
- Blood pressure & adipose tissue
- Smiles and cortisol levels

Some Things We Can’t See…
- “Socio-economic status”
- “Health”
- “Stress”

Ostensible & Non-Ostensible (cont.)

Things we can observe empirically are sometimes called ostensible
While the underlying “construct” they are manifestations of are non-ostensible

Ostensible & Non-Ostensible (end)

It is quite common to use ostensible variables to represent non-ostensible constructs
- Even if it is less common to realize (or least acknowledge) that is what one is doing
However, it is indeed important that the ostensible variable(s) well represent the construct of actual interest

This can be tested statistically (viz., psychometrically)
- But is ultimately an issue that is decided theoretically

Model Testing

Comparing Model Fits

$\text{Difference} = R^2_{\text{2nd Model}} - R^2_{\text{1st Model}}$

If that difference is significant, then that “family” of variables in the second model significantly improves our understanding of the outcome

This is sometimes not done using R²,
- But instead the information criterion
- Determined with maximum likelihood estimation (MLE)
  - MLE requires a computer since it is computed iteratively (& math is less approachable)
  - But they are more robust than ordinary least squares
  - AIC & BIC are measures of misfit
    - So larger numbers are worse

Comparing Model Fits (cont.)

Common information criterion:
- -2 $\times$ log-likelihood (-2LL)
  - This is a “raw” information criterion from which others are derived
  - Usually not itself used
    - But is sometimes reported, so it’s good to know
- Akaike Information Criterion (AIC)
- Bayesian Information Criterion (BIC)
AIC & BIC both adjust for number of terms in the model
- More than does R²—an other reason to use them
- But BIC adjusts more aggressively
  - So use BIC when there are a lot of terms in the model

Comparing Model Fits (cont.)

Thus want to see that the model with the “family” of interest has a smaller information criterion
- I.e.:

\[\text{Difference in Model Fit} =\]

\[ AIC_{\text{Model without Family}} - AIC_{\text{Model with Family}}\]

Information criteria can be large, so this may be, e.g.:

\[\text{Diff. in Model Fit} = 2020 - 2000 = 20\]

Model Testing (cont.)

We test this difference against a χ² with degrees of freedom equal to the difference in dfs between the models
E.g.:

$\ \ \ \ b_{1}X_{1}$

$\ \ \ \ b_{1}X_{1} + ( b_{ZIP}X_{ZIP} + b_{Salary}X_{Salary} )$

The second model as 2 more dfs than the first¹
- So we test a χ² = 20 to see if it is significant
  - Against a χ² distribution based on 2 df
- Which would be significant since the critical χ² $\approx$ 10
  - (Since for a χ² dist., the mean = df and SD = 2df)

¹ ZIP could have more dfs, depending on how many ZIP codes are in that variable, but let’s assume it’s 1 dfs (thus two levels for ZIP, either dummy or just two ZIP codes.)

Linear Regression ReviewandTesting Models Theoretically

Overview

Summary to Date

Descriptives vs. Inferentials

Descriptives vs. Inferentials (cont.)

Descriptives vs. Inferentials (cont.)

Central Limit Theorum

Variance & Covariance

Variance & Covariance (cont.)

Partialing Out Variance

Ways of Communicating

Overview of What’s Next

Review of Linear Regression

Basic Strategy

Partialing Out Variance

Partialing Out Variance (cont.)

Combining Similar

Common Sources of Variance

Combining Sources of Variance (cont.)

Combining Sources of Variance (end)

A Brief Aside about Ostensible &

Ostensible & Non-Ostensible

Ostensible & Non-Ostensible (cont.)

Ostensible & Non-Ostensible (end)

Model Testing

Comparing Model Fits

Comparing Model Fits (cont.)

Comparing Model Fits (cont.)

Model Testing (cont.)

\(The\ End\)

Linear Regression Review
and
Testing Models Theoretically