Intro to
Linear Regression

Overview

Review of Underlying Concepts in Inferential Statistics
Understanding Linear Models
The Flexibility of Linear Models
An Example
Further Considerations

Review of Underlying Concepts

Variance & Covariance
- Importance in statistical analyses
Covariance & Correlation
- Relationship between them
- Why use one or the other?
- Both are descriptive statistics
  - Even though tests can be run on them

Review of Underlying Concepts (cont.)

Assumptions made in computing correlations
- Measurement level is correctly conceived (ordinal, interval, ratio, etc.)
- Relationship is linear
Assumptions made in testing their significance
- Monotonicity
  - For Pearson’s r, also that variables are normally distributed & homoscedastic
    - And that the variables are bivariate normal
- No big outliers

Review of Underlying Concepts (cont.)

Partial & Semipartial Correlations
- Semipartial correlations remove the effect of another variable from one of the correlated pair
- Could remove the effect of several other variables from one of the pair
  - Which is conceptually the same thing we do in separating out the effects of various predictors on the same outcome

Review of Underlying Concepts (cont.)

Correlations & Error
- Correlations separate dispersion into variance & covariance
  - But make no assumptions about where error comes from
- However, when testing significance of Pearson’s r, error is assumed to be normally distributed

Review of Underlying Concepts (end)

Correlations & Error (cont.)
- The (unshared) variances of both variables comprise the denominator
  - (This will be different for linear regression models)

Understanding Linear Models

Buildling the Equation

Simplest form of a linear relationship is \(Y = bX\)
- Where:
  - \(Y\) = Outcome / response / criterion / DV
  - \(X\) = Predictor / input / IV
  - \(b\) = Slope of \(X\)
    - The typical null hypothesis (H₀) of “no effect” is expressed here as:
      \(b\) = 0

Buildling the Equation (cont.)

\[Y = bX\]

Buildling the Equation (cont.)

However, we typically add more terms and notations to the equation
- The first term we add is the y-axis intercept
  - And subscripts to differentiate that from the slope:
    \(Y = b_{0} + b_{1}X_{1}\)
    - \(Y\) = Outcome
    - \(b_{0}\) = Value of \(X\) at y-axis intercept
    - \(b_{1}\) = Slope of \(X_{1}\)
    - \(X_{1}\) = Predictor \(X_{1}\)

Buildling the Equation (cont.)

\[Y = b_{0} + b_{1}X_{1}\]

Buildling the Equation (cont.)

\[Y = b_{0} + b_{1}X_{1}\]

We are using the data to estimate:
- \(b_{0}\): The beginning/baseline value (when \(X_{1}\) = 0)
- \(b_{1}\): The effect of \(X_{1}\) on \(Y\)
  - How much a change in \(X_{1}\) corresponds to a change in \(Y\)

Buildling the Equation (cont.)

The equation so far is the model: \(Y = b_{0} + b_{1}X_{1}\)
- We are modeling a linear relationship between \(X_{1}\) and \(Y\)
- Assuming linearity, \(b_{0}\) & \(b_{1}\) contain all of the information about that \(X_{1}\)-\(Y\) relationship
We can test the significance and effect size of the whole model
- How much of the relationship is accounted by that linear model

Buildling the Equation (cont.)

But the model mis-estimates the actual data
- There is error in the model’s estimations

Buildling the Equation (end)

We add this error to the equation:
\(Y = b_{0} + b_{1}X_{1} + e\)
- \(Y\) = Outcome
- \(b_{0}\) = Value of \(X\) at y-axis intercept
- \(b_{1}\) = Slope of \(X\)
- \(X_{1}\) = Predictor \(X_{1}\)
- \(e\) = Error
“Error” is thus information about the \(X_{1}\)–\(Y\) relationship not explained by the model
- But is still the “noise” against which we test the model’s “signals” (\(b_{0}\) & \(b_{1}\))

More About the Equation

\[Y = b_{0} + b_{1}X_{1} + e\]

Because error is separated out,
- The value of \(X_{1}\) is assumed to be measured without error
Separating the intercept, slope, & error means each can be estimated & modified separately
Note that Kim, Mallory, & Valerio (2022) present this equation as: \(y = a + b \times x + \sigma\)

More About the Equation (cont.)

Adding more specificity to the equation:
\(\hat{Y}_{i} = b_{0} + b_{1}X_{i1} + e_{i}\)
- \(\hat{Y}_{i}\) = Predicted value of \(Y\) for participant \(i\)
- \(b_{1}\) = Slope for variable \(X_{1}\)
- \(X_{i1}\) = Value on \(X_{1}\) for participant \(i\)
- \(e_{i}\) = Error of measurement of participant \(i\)’s outcome

More About the Equation (end)

Participant \(i\)’s score on \(X_{1}\) here is 2.1
The predicted value of \(Y_{i}\) for participant \(i\) is:
- \(\hat{Y}_{i} = 6 + (2 \times 2.1) = 10.2\)

The actual value of \(Y_{i}\) for participant \(i\) also includes the error:
- \({Y}_{i} = 6 + (2 \times 2.1) + 4.3 = 14.5\)

Linear Models vs. ANOVAs

ANOVA (and ANCOVA, MANOVA, etc.)
- Is a type of linear regression
- Results focus on significance of variables
  - When all are present in the model together
Linear Regression
- Is a more flexible framework
- Can model complex relationships & data structures
  - E.g., non-linear relationships & nested data
- Can test whole models
  - And effects on the whole model when variables are added or removed

Questions Best Addressed by ANOVAs vs. Linear Models

ANOVAs (and ANCOVAs, MANOVAs, etc.) can ask:
- Which variable is significant?
- Is there an interaction between variables?
Linear regressions can also ask:
- What is the best combination of variables?
- Does a given variable—or set of variables—significantly contribute to what we already know?

Signal-to-Noise in Linear Models

\(\hat{Y}_{i} = b_{0} + b_{1}X_{i1} ... + b_{k}X_{ik} + e_{i}\)

The variance in \(\hat{Y}_{i}\) is divided into:
- Changes due to the predictors—the “signals”
- Changes due to “other things” (and relegated to error / noise term(s)
- (N.b., the intercept, \(b_{0}\), is a constant when measuring \(\hat{Y}_{i}\), and so not included in this partitioning of the variance in \(\hat{Y}_{i}\))

Signal-to-Noise in Linear Models (cont.)

\(\hat{Y}_{i} = b_{0} + b_{1}X_{i1} ... + b_{k}X_{ik} + e_{i}\)

The sum of squares representation of this partition into predictors & error looks like:

\[\sum\limits_{i=1}^{n} (Y_{i} - \overline{Y})^{2} = \sum\limits_{i} (\hat{Y}_{i} - \overline{Y})^{2} + \sum\limits_{i} ({Y}_{i} - \hat{Y}_{i})^{2}\]

Signal-to-Noise in Linear Models (cont.)

\(\sum\limits_{i=1}^{n} (Y_{i} - \overline{Y})^{2} = \sum\limits_{i} (\hat{Y}_{i} - \overline{Y})^{2} + \sum\limits_{i} ({Y}_{i} - \hat{Y}_{i})^{2}\)

I.e.,the squared sum of the differences of each instance (\(Y_{i}\)) from the mean (\(\overline{Y}\)) equals:
- The squared sum differences of each predicted value (\(\hat{Y}_{i}\)) from the mean
- Plus the squared sums of differences of the actual values (\(Y_{i}\)s) from the respective predicted values

Signal-to-Noise in Linear Models (cont.)

Another way of saying this:

\(\sum\limits_{i=1}^{n} (Y_{i} - \overline{Y})^{2} = \sum\limits_{i} (\hat{Y}_{i} - \overline{Y})^{2} + \sum\limits_{i} ({Y}_{i} - \hat{Y}_{i})^{2}\)

Is to say this:
\(\text{Total SS = SS from Regression + SS from Error}\)
- Or, further condensed as:
  - \(SS_{Total} = SS_{Reg.} + SS_{Error}\)

Signal-to-Noise in Linear Models (end)

Using \(SS_{Total} = SS_{Reg.} + SS_{Error}\),
- We can compute the ratio of predicted to actual:
  Ratio of Predicted-to-Actual Variance = \(\frac{SS_{Reg.}}{SS_{Total}}\)
  - Or, equivalently as \(1 - \frac{SS_{Reg.}}{SS_{Error}}\)

We typically represent this ratio as R²:

\[R^{2} = \frac{SS_{Reg.}}{SS_{Total}} = 1 - \frac{SS_{Reg.}}{SS_{Error}}\]

Yep, that’s what \(R^{2}\) means in ANOVAs ☻

The Flexibility of
Linear Models

Adding More Terms to Models

Adding another variable to the equation:
\(\hat{Y}_{i} = b_{0} + b_{1}X_{i1} + b_{2}X_{i2} + e_{i}\)
- \(X_{i2}\) = Participant \(i\)’s value on the other variable (\(X_{2}\)) added to the model
- \(b_{2}\) = Slope for \(X_{2}\)
Since there are multiple predictors (\(X\)s) in this model,
- This is called a multiple linear regression

More About the Equation (cont.)

We can continue to add still more variables to the model, e.g., \(X_{3}\) and \(X_{3}\):

\[\hat{Y}_{i} = b_{0} + b_{1}X_{i1} + b_{2}X_{i2} + b_{3}X_{i3} + b_{4}X_{i4} + e_{i}\]

When there are a lot of variables in the model, then we usually abbreviate the equation to:
\[\hat{Y}_{i} = b_{0} + b_{1}X_{i1} ... + b_{k}X_{ik} + e_{i}\]
- Were \(k\) is the number of variables

Adding More Terms to Models (cont.)

\(\hat{Y}_{i} = b_{0} + b_{1}X_{i1} ... + b_{k}X_{ik} + e_{i}\)

We can test interactions by adding additional terms
- E.g., \(... b_{1}X_{i1} + b_{2}X_{i2} + \mathbf{b_{3}(X_{i1}X_{i2}}) ...\)
Or test non-linear effects, also by adding terms
- E.g., \(... b_{1}X_{i1} + \mathbf{b_{2}X_{i1}^{2}} ...\)

Adding More Terms to Models (end)

Just as we separated out the effects of the predictors,
- We can separate out sources of error
  - E.g., per predictor/term in the model
We can also combine error terms
- E.g., we can “nest” one variable into another
  - Patients nested within hosptital units
  - Wave (time) nested within patient

Modeling Linear & Non-Linear Relationships

\(\hat{Y}_{i} = b_{0} + b_{1}X_{i1} ... + b_{k}X_{ik} + e_{i}\)

\(Y\) is assumed to follow a certain distribution
- This determines how error is modeled
- E.g., is error usually assumed to be normally distributed
- But both distributions can be assumed to be something else
  - E.g., logarithmic

Modeling Linear & Non-Linear Relationships (cont.)

\(\hat{Y}_{i} = b_{0} + b_{1}X_{i1} ... + b_{k}X_{ik} + e_{i}\):

\(X\)s can be nominal, ordinal, interval, or ratio
- This affects how those variables are modeled
- As well as the error related to them
We could transform the terms on the right
- E.g., raise them to a power or take their log

Modeling Linear & Non-Linear Relationships (cont.)

\(\hat{Y}_{i} = b_{0} + b_{1}X_{i1} ... + b_{k}X_{ik} + e_{i}\)

For an ANOVA (and t-tests):
- \(Y\) is assumed to be normally distributed
- The \(X\)s are nominal
- The model terms are not transformed
  - Leaving their relationship with \(Y\) linear
Technically, called an “identity transformation”
- Meaning they are multiplied by 1:
  \(\hat{Y}_{i} = 1 \times (b_{0} + b_{1}X_{i1} ... b_{k}X_{ik} + e_{i})\)

Modeling Linear & Non-Linear Relationships (end)

The terms can be transformed in other models
- This transformation is called a Link Function
  - Since it “links” the terms on the right to the predicted value of \(Y\) on the left
E.g., logistic regression uses a logarithmic (\(e\)) link:
\(\hat{Y}_i = \frac{e^{b_{0} + b_{1}X_{i1} + \cdots + b_{k}X_{ik}}}{1 + e^{b_{0} + b_{1}X_{i1} + \cdots + b_{k}X_{ik}}}\)
which is more often written as:
\(\ln\!\frac{\hat{Y}_i}{1-\hat{Y}_i} = \;b_{0}+b_{1}X_{i1}+\cdots+b_{k}X_{ik}\)

Generalized Linear Models

That family of models is referred to as generalized linear models
- I.e., we can generalize that linear model into other types of models
ANOVAs, t-tests, and logistic regressions are types of generalized linear models
Generalized linear models typically use maximum likelihood estimation (MLE) to compute terms
- The ordinary least squares of ANOVAs, etc. is itself a specific type of MLE

Generalized Linear Models (cont.)

N.b., confusingly, in addition to generalized linear models,
- There are general linear models
- “General linear model” simply refers to models you already know.
  - I.e., those with:
    - Normally-distributed, iid variables &
    - Identity link functions
  - Like ANOVAs & multiple linear regressions

Generalized Linear Models (cont.)

Assumptions of generalized linear models:
- Relationship between response and predictors must be expressible as a linear function
  - But many can model heteroscedasticity well
- Cases must be independent of each other
  - Or that relationship should be part of the model
    - Often as nesting

Generalized Linear Models (end)

Assumptions of generalized linear models (cont.):
- Predictors should not be too inter-correlated (lack of multicollinearity)
  - Linear regression simply cannot get accurate measures of two effects if they cannot be easily separated
- The random error & link functions should approximate their actual functions

An Example

Predicting BMI from Sex & Neighborhood Safety

Predicting body mass index levels among adolescents from:
1. Whether an adolescent is biologically female
2. Whether they feel their neighborhood is safe
via SPSS (v. 29 & 31)

Data Used

From the National Longitudinal Study of Adolescent to Adult Health (Add Health)
- Using the prepared add_health.sav dataset
- Since data are longitudinal, only the first instance (wave) of data collection was used
  - Selected via:
    1. Data > Select Cases...
    2. Under If condition is satisfied, added Wave = 1 to select only the first wave

Descriptives

The mean BMI (29.144) was nearly obese
Since Bio_Sex was coded 0 = Male & 1 = Female, 52% of the participants were biologically female
And 90% reported feeling safe in their neighborhood
About 77% (5025/6503) of cases had data on all three variables

Descriptives (cont.)

BMI was positively skewed
So, those with exceptionally high BMIs affect the results more

Descriptives: Q-Q Plot

Analyze > Explore... > Plots > Normality plots with tests
That skew—and a limited lower range—caused some deviations from normality

Descriptives: 2 × 2 Table

Analyze > Descriptives Statistics > Crosstabs...
The number of adolescents who felt safe in their neighborhood is not significantly different between the sexes

Correlations

The correlations also reflect the weak relationship between Bio_Sex & Feel_Safe_in_Nghbrhd
Feeling safe—but not sex—significantly correlated with BMI

Correlations with CIs

The 95% confidence intervals (and correlations themselves) are slightly different when using Fisher’s r-to-z transformation versus bootstrapping
Given the deviations from normality, bootstrapping is preferable here

95% confidence intervals generated from Fisher’s r-to-z transformation:

95% confidence intervals generated from bootstrapping:

Linear Regression

Conducted via:
1. Analyze > Regression > Linear
2. BMI as Dependent
3. Bio_Sex & Feel_Safe_in_Nghbrhd as predictors in Block 1 of 1

Linear Regression (cont.)

The combination of Bio_Sex & Feel_Safe_in_Nghbrhd did not explain much of the variance in BMI scores
- The R² was .004; adjusted for number of terms in the model, it was .003
- This combination of variables thus only accounted for about 0.3% – 0.4% of the total variance in BMIs
  - The high standard error, however, indicates that replications may find rather different R²s

Linear Regression (cont.)

Nonetheless, the model was significant
- The intercept & sample size both surely helped
This ANOVA source table presents the effect of the overall model
- Like first testing a variable in an ANOVA before conducting post hocs, this helps protect against over-interpreting

Linear Regression (cont.)

Biological sex & feeling safe in one’s neighborhood both significantly predicted BMI
The standardized β for sex means its effect size was close to “small”
- It is “medium” for feeling safe (q.v. η² criteria in this table)
The positive effect for sex means those identifying as female (1s) tended to have higher BMIs than those identifying as male (0s)
The negative value for feeling safe means those who felt safe (1s) tended to have lower BMIs than those who didn’t (0s)

Interpretting the Effects

Writing these results in linear equation form:

\(\hat{\text{BMI}} = 30.205 + (0.325 \times \text{Sex}) + (-1.383 \times \text{Feeling Safe})\)

Interpretting the Effects (cont.)

\[\hat{\text{BMI}} = 30.205 + (0.325 \times \text{Sex}) + (-1.383 \times \text{Feeling Safe})\]

Since Bio_Sex was coded 0 = Male & 1 = Female
- And Feel_Safe_in_Nghbrhd as 0 = No & 1 = Yes,
We predict that the BMI
- For a male (0)
- Who does not feel safe (0)
- Is 30.205:

\[\begin{align*} \hat{\text{BMI}} & = 30.205 + (0.325 \times 0) + (-1.383 \times 0) \\ & = 30.205 + 0 + 0 \\ & = 30.205 \end{align*}\]

Interpretting the Effects (cont.)

\[\hat{\text{BMI}} = 30.205 + (0.325 \times \text{Sex}) + (-1.383 \times \text{Feeling Safe})\]

The predicted BMI for
- A female (1)
- Who does not feel safe (0)
- Is 30.530:

\[\begin{align*} \hat{\text{BMI}} & = 30.205 + (0.325 \times 1) + (-1.383 \times 0) \\ & = 30.205 + 0.325 + 0 \\ & = 30.530 \end{align*}\]

Interpretting the Effects (cont.)

\[\hat{\text{BMI}} = 30.205 + (0.325 \times \text{Sex}) + (-1.383 \times \text{Feeling Safe})\]

The predicted BMI for
- A male (0)
- Who does feel safe (1)
- Is 29.147:

\[\begin{align*} \hat{\text{BMI}} & = 30.205 + (0.325 \times 0) + (-1.383 \times 1) \\ & = 30.205 + 0 - 1.383 \\ & = 29.147 \end{align*}\]

Etc.

Further Considerations

Multicollinearity

When two or more predictors share too much variance
- I.e., are strongly correlated
  - E.g., r ≥ .8 (Dormann et al., 2013)
Two general sources:
1. Structural: Caused by how the model was constructed
- E.g., adding interaction terms
1. Data: Caused by variables that are inherently correlated

Multicollinearity (cont.)

Problems caused by multicollinearity:
- Parameter estimates of multicollinear terms can be unstable
  - And even reverse sign
- Reduces the power of the whole model
  - Because the parameter estimates are less precise

Multicollinearity (cont.)

Multicollinearity doesn’t typically affect the whole model’s R²
- Or the model’s good-of-fit statistics
It mostly impairs interpretation of individual predictors
Can be tested with variance inflation factor (VIF)
- VIF ranges from 1 to \(\infty\)
- Where values >10 may indicate problems (e.g., Salmerón et al. 2016)

Multicollinearity (cont.)

Addressing multicollinearity
- Centering variables (subtracting the mean) can reduce structural multicollinearity (Iacobucci et al., 2016)
- Remove one of the correlated variables
- Only test/compare overall model fit
- Use another analysis
  - E.g., canonical correlations or principal component analysis

Multicollinearity (end)

Multicollinearity is typically not a concern if the variables with high multicollinearity are:
- Control variables
- Intentional products of other variables
  - E.g., interaction terms, raised to a power, etc.
- Dummy variables

Independence of Cases

When one case (participant, round of tests, etc.) is correlated with another case
Can also produce unstable parameter estimates
- Thus affecting significance tests
  - Through both false positives (Type 1) & false negatives (Type 2)
May also affect model goodness of fit
- And not isolated to a few predictors

Independence of Cases (cont.)

Addressing non-independence
- Best is through research design
- Can also model inter-dependence
  - E.g., nesting cases
    - As is done explicitly in multilevel (hierarchical) models

The Games

Space Invaders
Lunar Lander
Asteroids (Doesn’t work well on Firefox)
Tempest
Star Wars
Battlezone
Elite

Intro to Linear Regression

Overview

Review of Underlying Concepts

Review of Underlying Concepts

Review of Underlying Concepts (cont.)

Review of Underlying Concepts (cont.)

Review of Underlying Concepts (cont.)

Review of Underlying Concepts (end)

Understanding Linear Models

Buildling the Equation

Buildling the Equation (cont.)

Buildling the Equation (cont.)

Buildling the Equation (cont.)

Buildling the Equation (cont.)

Buildling the Equation (cont.)

Buildling the Equation (cont.)

Buildling the Equation (end)

More About the Equation

More About the Equation (cont.)

More About the Equation (end)

Linear Models vs. ANOVAs

Questions Best Addressed by ANOVAs vs. Linear Models

Signal-to-Noise in Linear Models

Signal-to-Noise in Linear Models (cont.)

Signal-to-Noise in Linear Models (cont.)

Signal-to-Noise in Linear Models (cont.)

Signal-to-Noise in Linear Models (end)

The Flexibility of Linear Models

Adding More Terms to Models

More About the Equation (cont.)

Adding More Terms to Models (cont.)

Adding More Terms to Models (end)

Modeling Linear & Non-Linear Relationships

Modeling Linear & Non-Linear Relationships (cont.)

Modeling Linear & Non-Linear Relationships (cont.)

Modeling Linear & Non-Linear Relationships (end)

Generalized Linear Models

Generalized Linear Models (cont.)

Generalized Linear Models (cont.)

Generalized Linear Models (end)

An Example

Predicting BMI from Sex & Neighborhood Safety

Data Used

Descriptives

Descriptives (cont.)

Descriptives: Q-Q Plot

Descriptives: 2 × 2 Table

Correlations

Correlations with CIs

Linear Regression

Linear Regression (cont.)

Linear Regression (cont.)

Linear Regression (cont.)

Interpretting the Effects

Interpretting the Effects (cont.)

Interpretting the Effects (cont.)

Interpretting the Effects (cont.)

Further Considerations

Multicollinearity

Multicollinearity (cont.)

Multicollinearity (cont.)

Multicollinearity (cont.)

Multicollinearity (end)

Independence of Cases

Independence of Cases (cont.)

The Games

Intro to
Linear Regression

The Flexibility of
Linear Models