Association
&
Causation






Overview

  • Individual Differences and Correlations
  • Types of Correlation Statistics
  • Partial and Semipartial Correlations
  • Concerning Causality

Individual Differences and Correlations

Three Building Blocks of Analysis

  1. Variability: The degree of differences within a set of scores
  2. Covariability: The degree to which variability in one set of scores corresponds with variability in another set
  3. Interpretation: Making sense of the arbitrary scores of tests

The Nature of Variability

  • We assume that people differ (or might differ) with respect to their behaviors, genetics, attitudes/beliefs, etc.
    • Inter-individual differences: differences between people (e.g., in their levels of an attribute)
    • Intra-individual differences: differences emerging in one person over time or in difference circumstances (e.g., change)

Importance of Individual Differences

  • For health & social science research:
    • We seek to understand differences among people (causes, consequences, etc.)
  • For applied health & social science science:
    • Important decisions/interventions are based upon differences among people

Variability and Distributions of Scores

  • A set of test scores (from different people) is a “distribution” of scores.
  • The differences within that distribution are called “variability”
  • How can we quantitatively describe a distribution of scores, including its variability?
    • At least three kinds of information

Describing a Distribution:
An Example Distribution

A set of IQ scores:   

Describing a Distribution:
Central Tendency

  • What is the typical score in the distribution, which score is most representative of the entire distribution?

\(\text{Mean} = \overline{X} = \frac{\Sigma X}{N}\)

\(\overline{X} = \frac{100 + 120 + 100 + 90 + 130 + 110}{6} = \frac{660}{6} = 110\)

\(\overline{X} = \frac{660}{6} = 110\)

Describing a Distribution:
Variability

  • To what degree do the scores differ from each other?
    • Variance
    • Standard deviation
    • In terms of “the degree to which scores deviate (differ) from the mean of the distribution”

\(\text{Variance} = s^{2} = \frac{\sum{(X - \overline{X})}}{N}\)

Describing a Distribution:
Variability (cont.)

\(s^{2} = \frac{\sum{(X - \overline{X})}}{N}\)

\(\ \ \ \ = \frac{(110 - 110)^{2} + (120 - 110)^{2} + (100 - 110)^{2} + (90 - 110)^{2} + (130 - 110)^{2} + (110 - 110)^{2}}{6}\)

\(\ \ \ \ = \frac{(0)^{2} + (10)^{2} + (-10)^{2} + (-20)^{2} + (20)^{2} + (0)^{2}}{6}\)

\(\ \ \ \ = \frac{0 + 100 + 100 + 400 + 400 + 0}{6}\)

\(\ \ \ \ = \frac{1000}{6}\)

\(\ \ \ \ = 166.67\)

Describing a Distribution:
Standard Deviation (cont.)


\(\text{Standard Deviation} = s = \sqrt{s^{2}} = \sqrt{\frac{\sum(X - X^{2})}{N}}\)

\(s = \sqrt{s^{2}} = \sqrt{166.67} = 12.91\)


  • Standard deviation is (simply) a measure of how far scores are—on average—from the sample mean
    • It is the average distance from the mean

Association Between Distributions

  • Covariability: The degree to which two distributions of scores (e.g., X and Y) vary in a corresponding manner
  • Two types of information about covariability:
    1. Direction: positive/direct or negative/inverse
    2. Magnitude: strength of association

Association Between Distributions:
Covariance

  • Covariance (Cxy): a statistical index of covariability
  • Direction of association
    • Cxy > 0 Positive association, high scores on X tend to go with high scores on Y, and low scores on X tend to go with low scores on Y
    • Cxy < 0 Negative association, high scores on X tend to go with low scores on Y, and low scores on X tend to go with high scores on Y
    • Cxy = 0 No association, high scores on X tend to go with high scores on Y just as often as they go with low scores on Y)
  • But covariance does not provide clear information about the magnitude of association

Association Between Distributions:
Correlation

  • Correlation (e.g., rxy, or just r): a standardized index of covariability
  • Direction of association is the same as Cxy
    • r > 0 Positive association
    • r < 0 Negative association
    • r = 0 No association
  • Magnitude of association
    • -1 ≤ r ≤ 1
    • As r gets closer to |1|, stronger association

Partial and Semipartial Correlations

Partial Correlation

  • A partial correlation allows us to see the correlation between two variables
    • With a shared correlation (covariate) removed
    • E.g., the correlation between
      • General weight (e.g., BMI) &
      • Maximum rate of oxygen consumption (VO2 max)
      • While removing the effect of age
    • Or the correlation between
      • Time to burnout among ICU nurses &
      • Nurse-to-patient ratio
      • While removing the effect of number of hours worked

Semipartial Correlation

  • Also called a “part” correlation
  • Removes the effect of a third variable from one of the two correlated variables


  • Why do such a thing?

Types of Correlation Statistics

Pearson Product-Moment Correlation

  • The one you know about
  • The correlation between two continuous (interval or ratio) variables
  • Abbreviated as r
  • The formula is also equal to the formula for a correlation between a continuous variable & a dichotomous variable
    • Call a point-biserial correlation
    • Abbreviated rpb

Spearman’s \(\rho\) (“rho”)

  • Spearman’s rank correlation (ρ)
    • Summarizes correspondences between ranks
    • Used for ordinal-ordinal comparisons
    • And non-normal data
  • Often, however, people use Pearson’s r assuming ordinal data is in fact interval
  • Admittedly, the results are often similar to r
    • But ρ is more robust than r

Kendall’s \(\tau\) (“tau”)

  • Used When data severely violates the assumptions of the Pearson r—e.g., normality
  • Its formula adjusts for non-normality:


\(\tau = \frac{\text{(Number of Concordant Pairs)} - (\text{Number of Discordant Pairs})}{\text{Total Number of Pairs}}\)


  • But is inefficient: It does not use all of the available information
    • And so may be less accurate

Concerning Causality

The Problem of Causality

  • Proving that one thing caused an other
  • Traditionally, viewed a nearly impossible to establish
  • Requiring evidence for “counterfactuals
    • Proof of what would have happened in the same situation
      • But with different influences
    • I.e., saying X caused Y,
      • Means Y would not have happened without X
    • “I would not have bought that coffee
        if it wasn’t so cheap.”

The Problem of Causality (cont.)

  • We can get close to studying counterfactuals in science
  • E.g.:
    • Closely-matched participants in experimental designs (e.g., with controls)
    • Longitudinal designs
      • Especially following a “cohort” through time
      • And alternating which undergoes what events
Group Time 1 Time 2
Cohort A Treated Nothing
Cohort B Nothing Treated

The Problem of Causality (cont.)

  • We can get close to studying counterfactuals in science (cont.)
  • E.g.:
    • Mediated effects
      • Comparing events with and without
        considering some possibly-moderating effect

The Problem of Causality (cont.)

The Problem of Causality  (cont.)

  • Tests of mediated effects are currently considered among the best measures of causality
    • Since they are closest to being able to test a counterfactual
  • But there is also a general trend
    towards accepting causal
    explanations in some areas
    of research

The Problem of Causality  (end)

  • E.g., Bulfone et al. (2022)
    • Found that there is a direct effect of academic self-efficacy on academic success
    • And that burn-out mediated part of that relationship
    • (The effects were similar among males & females)