Measuring
& Testing
Differences

Overview

  • Review of Assumptions in Inferential Statistics
  • Hypothesis Testing
  • Effect Size
  • Signal-to-Noise Ratio
  • Common Tests
    • χ²
    • t & F

Review of Assumptions in
Inferential Statistics

Assumptions in Inferential Statistics:
Representativeness

  • Three general types of assumptions:
    1. The sample represents the population
    2. That each data point (“datum”) is independent of the others
    3. That the population’s data are normally distributed
  • There are more/other assumptions that can be made, e.g.:
    • Other distribution shapes
    • Nature of any missing data
    • Whether data are continuous or discrete

The Normal Distribution

Normal Curve

Characteristics of the Normal Distribution

  • Most importantly, it is only a function of the mean & standard deviation
  • The mean, median and mode are all equal
  • The total area under the curve equals 1
  • It’s symmetric
  • The curve approaches—but never touches—the x-axis

Hypothesis Testing

Hypothesis Testing

Hypothesis Testing (cont.)

  • The “null” hypothesis is that there is no effect/difference
    • The p-value is technically the probability of finding the given pattern of data if the null is true
    • It’s couched this way mainly for philosophical reasons
      • I.e., that we can’t prove an effect,
        • But simply that there doesn’t seem to be
          nothing
      • Kind of like in criminal court
        • We don’t say that someone is “innocent,”
        • But that they are “not guilty”—that there
          isn’t enough evidence to prove guilt

Hypothesis Testing (cont.)

Hypothesis Testing (cont.)

Hypothesis Testing (cont.)

Hypothesis Testing (cont.)

Hypothesis Testing (cont.)

Hypothesis Testing (cont.)

Odds of Being Diagnosed with Various Comorbidities Among Older Adults with Opioid Use Disorders

Baumann, S. & Samuels, W. E. (in review). Comparing comorbidities of older adults with opiate use disorder by race and ethnicity. Journal of the American Association of Nurse Practitioners.

Effect Size

Effect Size: General Concepts

  • “Effect size” is indeed a measure of the size of an effect
    • I.e., not if there is an effect, but how much of an effect
    • E.g.:
      • How much better a patient is after a given treatment
      • How much better one treatment is than an other
      • How much age affects a patient’s post-op recovery
      • How much more common a disorder is among one racial group vs. an other

Effect Size: General Concepts (cont.)

  • There are several measures of effect size
    • Different ones for different types of data & analyses
    • E.g.:
      • How much better a patient is after a given treatment
  • Cohen’s d (\(= \frac{\text{Mean Health}\ Before\ \text{T}_{x}\ -\ \text{Mean Health}\ After\ \text{T}_{x}}{\text{Pooled}\ SD}\))
  • How much better one treatment is than an other
  • Also Cohen’s d
  • How much age affects a patient’s post-op recovery
  • r (Correlation)
  • How much more common a disorder is among one racial group vs. an other
  • Odds ratio

Effect Size: General Concepts (cont.)

  • Most effect size statistics are standardized
    • But not all in the same ways, so they’re not all on comparable scales
    • So we generally can’t compare the values of different types of effect size measures
      • E.g., we can’t directly compare r = .5 with OR = .5
  • However, we can convert between most types of effect size statistics
  • And one type of effect size measure can be compared against that same type in other studies
    • This generally allows us to compare results between studies
    • Which we cannot do with significance tests

Common Effect Size Statisitcs (cont.)

  • Many were created by Cohen (1988)
    • Who gave the famous recommendations for what would be a “small,” “medium,” and “large” effect
    • But Kraft (2020) argued for more modest
      expectations for education interventions
      • I.e., times when a small effect may compound over time

Interpretting Effect Sizes

  • A “small” effect is an effect that accounts for about 1% of the total variance, e.g.:
    • The mean difference in IQs between twin and non-twin siblings
    • The difference in heights between 15- and 16-YO girls
  • A “medium” effect is detectable by the naked eye, e.g.:
    • The mean difference in IQs between members of professional and managerial occupations
    • The difference in heights between 14- and 18-YO girls
  • A “large” effect is, well, large, e.g.:
    • The mean difference in IQs between college graduates and those who have a 50% chance of graduating high school
    • The difference in heights between 13- and 18-YO girls

Differences in IQs Across Professions

From: Hauser, R. M. (2002). Meritocracy, cognitive ability, and the sources of occupational success. CDE Working Paper 98-07 (rev). Center for Demography and Ecology, The University of Wisconsin-Madison, Madison, Wisconsin

Interpretting Effect Sizes (cont.)

Statistic Notes & Refs   Small Medium Large
Cohen’s d Cohen, 1988, p. 25 .2 .5 .8
For ed. interventions (Kraft, 2020) .05 < .2 ≥ .2
h Difference btwn proportions; p. 184 .2 .5 .8
r The correlation coefficient, p. 83 .1 .3 .5
q Difference btwn correlations; p. 115 .1 .3 .5
w For χ² goodness of fit & contingency tables; p. 227 .1 .3 .5
η² For (M)AN(C)OVAs .01 .06 .14
f & β Also for (M)AN(C)OVAs; p. 285 & p. 355 .1 .25 .4
For ed. interventions (Kraft, 2020) .025 < .1 ≥ .1
f² & β² For multiple regression/correlation, p. 413; multivariate linear regression & multivariate R², p. 477 .02 .15 .35

Example of Effect Sizes

Correlations between nurses’:

  • Compassion fatigue &
    compassion satisfaction
  • Burnout &
    Compassion satisfaction
  • Burnout &
    Compassion fatigue

From Zhang et al. (2018)

The Signal-to-Noise Ratio &
Its Use in Hypothesis Tests

Effect Size Redux

  • The “effect” in effect size can be:
    • Difference (between group means)
    • Association / relationship (between variables)
    • Prevalence / rates (differences in frequencies, odds, odds ratios)
  • Alone, it is a descriptive statistic
    • It measures the magnitude
    • But not how likely we are to find a similar effect size in an other sample
      • I.e., how generalizable it is
    • Or how much it matters relative to other information
      • I.e., how big this explained “signal” is compared to the unexplained “noise” in the data

Signal-to-Noise Ratio

  • Generally, information in a sample of data is placed into
    two categories:
    • Signal,” e.g.:
      • Difference between group means,
      • Magnitude of change over time, or
      • Amount two variables co-vary/co-relate
    • Noise”, e.g.,
      • Differences within a group
      • “Error”—anything not directly measured

Signal-to-Noise Ratio (cont.)

  • Many statistics & tests are these ratios
    • And investigating multiple signals & even multiple sources of noise
  • And if there is more signal than noise,
    • We can then test if there is enough of a signal to “matter”
    • I.e., be “significant”
  • E.g., the F-test in an ANOVA
    • A ratio of “mean square variance” between groups/levels vs. “mean square error” within each group
    • If F > 1, then use sample size to determine if the value is big enough to be significant

Signal-to-Noise Ratio (cont.)

Table 7.4: Tests of Between-Subject Effects on Health Literacy Knowledge

Source Sum of Squares df Mean Square F p Partial \(\eta^2\)
Information Intervention 4.991 1 4.991 5.077 .025 0.028
Telesimulation Intervention 0.349 1 0.349 0.355 .552 0.022
Error 172.061 175 0.983

  • \(\frac{\text{Signal}}{\text{Noise}}=\frac{\text{Mean Square Between}}{\text{Mean Square Error}}\)
  • E.g., for the Information Intervention: \(\frac{4.991}{0.983} = 5.077\)

Patton, S. (2022). Effects of telesimulation on the health literacy knowledge, confidence, and application of nursing students. Doctoral dissertation, The Graduate Center, CUNY.

Inferential Statistics:
Common Tests

The \(\chi^2\) Distribution & Test

  • Background
    • Invented by Karl Pearson (in an abstruse 1900 article)
    • Originally used to test “goodness of fit
      • If two sets of data follow the same distribution or frequencies of events
        • E.g., if the numbers of, e.g., patients with hospital-acquired pressure injuries differs between diabetic and non-diabetic patients
      • Or how well a set of data fit a theoretical distribution
        • E.g., if a sample’s distribution is the same as a normal distribution

Characteristics of the \(\chi^2\) Distribution

  • The distribution’s shape, location, etc. are all determined by the degrees of freedom
    • I.e.:
      • The mean = df
      • The variance = 2df
        • SD = \(\sqrt{2df}\)
      • The maximum value for the y-axis = df – 2
        (when dfs >1)
  • As the degrees of freedom increase:
    • The χ² curve approaches a normal distribution &
    • The curve becomes more symmetrical

Characteristics of the \(\chi^2\) Distribution (cont.)

Plots of Several χ² Distributions

Chi Dist

Uses of the \(\chi^2\) Distribution

  • Because it only depends on df,
    • and resembles a normal distribution,
  • It is useful for testing if data follow a normal distribution
    • Or if the total number of deviations from normality are greater than expected.
  • It can do this for discrete values—like counts
    • Since it depends only on counts

Uses of the \(\chi^2\) Distribution (cont.)

  • The χ² distribution has many uses, including:
    1. Estimating of parameters of a population of an unknown distribution
    2. Checking the relationships between categorical variables
    3. Checking independence of two criteria of classification of multiple qualitative variables
    4. Testing deviations of differences between expected and observed frequencies
    5. Conducting “goodness of fit” tests

Example of a \(\chi^2\) Test

Notes: “AA” = African Americans; p-values are from tests of χ²s

Zhang, A. Y., Koroukian, S., Owusu, C., Moore, S. E., & Gairola, R. (2022). Socioeconomic correlates of health outcomes and mental health disparity in a sample of cancer patients during the COVID-19 pandemic. Journal of Clinical Nursing. https://doi.org/10.1111/jocn.16266

t and F Statistics

  • Very common tests of differences in means
    • These are signal-to-noise ratios
    • Cannot be significant if there is more noise than signal
      • I.e., if t < 1 or if F < 1
    • If >1, then can be significant if the sample is big enough
  • t is used to test the mean difference between two groups
    (“t for two”)
    • F is used for three or more groups
  • Mathematically:
    • The distributions of each strongly resemble
      normal distributions
    • t² = F

Example of t-Tests

β-weights are tested via t- or F-tests.

Associations between:

  • Nurse staffing & skill mix and
  • Hospital consumer assessment of health care providers & systems (HCAHPS) measures
  • In pooled cross-sectional and longitudinal regression models

From Martsolf et al. (2016)

Thank you