Overview

  • Variability & Randomness
  • Levels of Measurement
  • Descriptive & Inferential Statistics
  • Sources of Variance and the Signal-to-Noise Ratio
  • Designing and Answering Questions
  • Hypothesis Testing

Variability & Randomness

Variance

  • Variance = Information
  • The extent to which things differ is the extent to which things require understanding
  • Much of statistics centers on investigating sources of variance
    • Often through detecting shared variance between things
    • And measuring the magnitude of that shared variance
      • Relative to unshared variance

Randomness

  • Randomness = Unpredictability
    • I.e., variation due to chance alone
    • Pure randomness cannot be predicted
  • Random influences on data are assumed to follow a normal distribution
    • With a mean of zero
      • I.e., to eventually cancel out
  • Enables generalization from samples to populations
    • As long as the sample was truly drawn at random

Levels of Measurement

Methods of Scaling

  • Scaling per se is simply:
    • “representing quantities of attributes numerically”
    • Methods of scaling not so straight-forward. . . .

Methods of Scaling (cont.)

  • Scaling involves applying a set of rules to measurements
    • Each “level” of measurement follows different rules for operationalizing variables
    • These, in turn, affect the permissible analyses
    • Stevens’ seminal work in the 1950s established framework still in use

Summary of Common Measurement Scales

Scale Basic Operation Permissible Transformations Permissible Statistics
Nominal Equality vs. Inequality Any one-to-one Counts, Frequencies, Mode
Ordinal Greater than vs. Less than Monotonically Increasing Median, Percents, Order Statistics, Kendall’s τ
Interval Equality of Intervals Difference Scores General Linear
x′ = bx + a
Ratio Equality of Ratios Multiplicative
x′ = bx
Geometric Mean (dampens outliers)

Nominal

  • Pretty straight-forward
  • Any non-duplicative value can represent any category
    • Can be numeric
      • Just be careful how a stat program treats it
      • Using 0 for one category can be useful
        • And 1 for another

Nominal (cont.)

  • Permissible statistics are limited
    • Even in operational views of them
  • But don’t discount counts
    • Ostensible & strongly valid
    • Is in fact a ratio scale

Classification / Categorization

  • Even with nominal scales, accuracy of categorization matters
  • As, of course, does the utility of the measure

Ordinal

  • Dichotomous variables
    • I.e., presence/absence of a trait
  • Likert-scales items often subsumed here
    • But asking participants to rank can be a useful alternative

Interval

  • Assumes equal intervals between levels
  • Ratios of absolute interval values are not meaningful
    • Since ratios change with transformations
      • E.g., 100°F : 50°F vs. 100°C : 50°C
    • But ratios of difference scores within a given
      transformation are meaningful

Interval (cont.)

  • Often as precise as one can achieve for most non-ostensible measurements
    • Since it can be difficult to measure precisely enough to ensure total absence of a trait
    • Or even define what its absence would mean
      • E.g., Satisfaction of care, level of pain, motoric development, bone density

Ratio

  • Most restrictive in transformations
    • And thus most permissive in analyses & mathematical operations
      • Including the four, basic algebraic operations:
        • Add, subtract, multiply, & divide

Ratio (cont.)

  • Allows computation of geometric mean:
    \[ x_{G} = (x_{1}x_{2}x_{3} ... x_{n})^{\frac{1}{n}}\]

  • I.e., multiply the item values, then take the nth root of them

    • Where n = number of items
  • Useful for percents

    • And as a way to account for outliers

Do These Levels Matter?

  • Stevens and other “representationalist” argue they sure do
    • Scaling is an inherent trait of that measure
    • Using operations available to a “higher” level assumes the traits of that level
      • And thus that your scaling is inaccurate
      • E.g., computing mean of “ordinal” means it’s not actually ordinal, so don’t pretend it is

Do These Levels Matter? (cont.)

  • Others argue that scaling can instead be considered a theory
    • We assume relationship between the ostensible & non-ostensible traits
      • Then use conventions established for that state to guide analyses
      • Essentially an “operationalist” view of measures

Do These Levels Matter? (cont.)

  • The operational view assumes all measurement contains error:
    • Observed = Actual + error

\[(O = A + e)\]

  • And thus that true ratio measures at least are rare

So, Do They Really Matter or What?

  • Arguably, it depends on how confident you are in your scaling
    • More valid & precise relationships between an ostensible scaling & the underlying non-ostensible trait suggest stronger adherence to the rules of that scale
    • Typically used to justify “higher” levels

So They Kinda Matter

  • In practice:
    • Nominal requires only good classification
    • Ordinal per se tends to be avoided
    • Interval is sought & used most
    • Ratio usually not worth trying for

“Lumpers” vs. “Splitters”

  • “Reducing” data to a lower scaling level is conservative
    • But very rarely justified in practice
  • I believe best is to use the highest defensible level
    • Data are expensive
      • It’s arguably ethical—and of course simply sensible—to use all of the information available

Descriptive & Inferential Statistics

Descriptives vs. Inferentials

  • Descriptives make no assumptions about the population from which the sample was drawn
    • They simply describe the sample of data
      • Number, percents, ratios/odds
    • Can also describe the distribution of the sample
      • Central tendency (mode, media, mean)
      • Dispersion (standard deviation, skewness, kurtosis)

Descriptives vs. Inferentials (cont.)

  • Inferentials make assumptions—inferences—about the nature of the entire population of data
  • The assumptions made can vary
    • Aren’t always the same
    • And sometimes the assumptions can be tested
  • Making assumptions allows us to conduct hypothesis tests
    • Hypothesis testing doesn’t define inferential stats
    • But is the most common reason to make the assumptions

Review of Descriptive Statistics

  • Not to be underestimated
    • In addition to being informative, they are inherently robust
  • Robust statistics are tolerant of violations of assumptions/inferences made about the population
    • And since descriptives don’t make any assumptions about it…
  • N.b., however, that the line between descriptive & inferential is blurry
    • And inferential tests can be made on descriptives
      • E.g., if the counts/frequencies of occurrence are the same between two groups

Review of Descriptive Statistics (cont.)

  • Central tendency
    • Yes, where the “center” of the data “tends” to be
    • “Center,” of course, can be differentially defined
    • Mode: the most common value; very robust to outliers
    • Median: the value with the same number of other values on either side
      • Often used instead of mean when there are many outliers
    • Mean: average value; least robust to outliers

Review of Descriptive Statistics (cont.)

  • Dispersion
    • How spread out the data are
    • Standard deviation
      • “On average, how far a given score is from the mean”
      • Equivalent for the median is the median absolute deviation (MAD)
        • “The median distance of scores from the median”
    • Standard deviation is also related to variance (discussed a bit later)

Review of Descriptive Statistics (cont.)

  • Dispersion (cont.)
    • Skew & kurtosis
      • Skew: if one of the distribution tails is longer than the other
      • Kurtosis: how spread out the data are
        • “Leptokurtic”: A very tight distribution; small SD
        • “Platykurtic”: A very spread-out distribution; large SD
    • Skew is more problematic than kurtosis

Assumptions in Inferential Statistics

  • Three general types of assumptions:
    1. That the sample represents the population
    2. Each data point (“datum”) is independent of the others
    3. That the population’s data are normally distributed
  • There are more/other assumptions that can be made, e.g.:
    • Other distribution shapes
    • Nature of any missing data
    • Whether data values are continuous or discrete

Assumptions in Inferential Statistics (cont.)

  • For hypothesis testing, some assumptions matter more than others
    • I.e., hypothesis tests tend to be robust against some violations of our assumptions
    • But not others
  • Understanding assumptions & their effects can
    help interpret results
  • Next:
    • Robustness of common assumptions for
      “ordinary least squares” (t-tests, ANOVAs)

Assumptions in Inferential Statistics:
Representativeness

  • That the sample represents the population
    • Robust with larger samples (also called “asymptotically robust”)
  • This is a manifestation of the “regression to the mean
    • I.e., that sample values tend to resemble population values when:
      • The sample size gets bigger
      • Repeated samples are drawn
    • “Regression to the mean” would probably be better called
      Convergence to the population mean

Assumptions in Inferential Statistics:
Representativeness (cont.)

  • Standard error of the mean (SEM, or just “standard error,” SE)
    • A measure for how well the sample mean represents the population mean
    • Like standard deviation for the sample mean
      • I.e., the population mean should have a ~68% chance of being within 1 SEM of the sample mean
    • The SEM gets ___________ as the sample size gets larger
  • The SEM is an inferential statistic
    • But is robust to violations of normality—even for relatively non-normal population distributions

Assumptions in Inferential Statistics:
Representativeness (cont.)

  • In part because the SEM tends to be symmetrical
  • I.e., the population mean has the same chance of being greater than or less than the sample mean
    • As long as participants/samples are independent & unbiased (and people can possible be resampled)
  • And if sample means tend to be symmetrically distributed,
    • Then the distribution of sample means tends to be normally distributed
  • This, my friends, is the Central Limit Theorem

Assumptions in Inferential Statistics:
Sample Independence

  • That each participant is independent of the others
    • Not robust! This assumption matters!
    • If one participant’s values are affect by other participants, this can introduce several types of bias
      • Can create false positives and/or false negatives
        • I.e., increase Type 1 and/or Type 2 errors
    • Can sometimes be addressed by properly “nesting” participants
      • E.g., patients in units, units in hospitals, etc.

Assumptions in Inferential Statistics:
Sample Independence (cont.)

  • That each data point is independent of the others (cont.)
    • Related assumption is that terms in a model are unrelated
      • E.g., that the independent variables (IVs) in an ANOVA are unrelated
        • And unrelated to the “error” term
      • Can manifest as multicollinearity
        • I.e., that 2+ IVs are highly correlated with each other
      • OLS tests are generally robust against multicollinearity
        • Unless it’s extreme (e.g., r > .8)

Assumptions in Inferential Statistics:
Normality

  • That the population’s data are normally distributed
    • Robust against some deviations from normality
  • Robust against kurtosis
    • Rarely actually matters
  • Moderately/asymptotically robust against skew
    • Especially from outliers
  • Not robust against “multimodality”
    • I.e., having more than one “hump” in the data

Assumptions in Inferential Statistics:
Normality (cont.)

  • Outliers have an out-sized effect on results
    • I.e., descriptions and inferences made are based more on them than on other data
      • Kinda like voters in Wyoming vs. New York
    • Researchers should somehow address outliers
      • Often simply removing them (e.g., trimmed means & Windsorized variance)
      • Better, though, is investigating their influence (and/or bootstrapping)
  • Their effect is lessened as sample size increases

Assumptions in Inferential Statistics:
Normality (cont.)

  • Mulitmodal data
    • Can indicate two subsamples
      • I.e., that sample should be split, or “stratified”
      • Or look at “localized” measures that focus on only part of the range
    • Must be addressed somehow since measures of central tendency are inaccurate

The Signal-to-Noise Ratio /
Variance & Covariance

Signal-to-Noise Ratio

  • Generally, information in a sample of data is placed into
    two categories:
    • Signal,” e.g.:
      • Difference between group means
      • Magnitude of change over time
      • Amount two variables co-vary/co-relate
    • Noise”, e.g.,
      • Difference within a group
      • “Error”—anything not directly measured

Signal-to-Noise Ratio (cont.)

  • Many statistics & tests are these ratios
    • And investigating multiple signals & even multiple sources of noise
  • And if there is more signal than noise,
    • We can then test if there is enough of a signal to “matter”
    • I.e., be “significant”
  • E.g., the F-test in an ANOVA
    • A ratio of “mean square variance” between groups/levels vs. “mean square error” within each group
    • If F > 1, then use sample size to determine if the value is big enough to be significant

Signal-to-Noise Ratio (cont.)

Source Sum of Squares df Mean Square F p Partial η2
Information Intervention 4.991 1 4.991 5.077 .025 0.028
Telesimulation Intervention 0.349 1 0.349 0.355 .552 0.022
Error 172.061 175 0.983
  • \(\frac{Signal}{Noise}=\frac{Mean\ Square\ Between}{Mean\ Square\ Error}\)
  • E.g., for the Information Intervention: \(\frac{4.991}{0.983} = 5.077\)

Designing and Answering Questions

Statistics as a Way of Thinking

  • Arguably a fundamental way quantitative research differs from qualitative
  • In part—yes—it’s knowing the sorts of questions to ask
    • As Fisher said, “[t]o call in the statistician after the experiment is done may be no more than asking [them] to perform a post-mortem examination: [they] may be able to say what the experiment died of.”
  • But hopefully it’s more centered on an understanding of science and the philosophical Zeitgeist within which it operates
    • Stats embodies science’s parsimony, objectivity,
      systematicity, precision, & probabilistic nature

Theory and Observation

  • Understanding the nature of scientific theory
    • And it’s (preeminent) relationship to research design & interpretation
  • Deciding what & how to measure
    • Levels of measurement, bias, roles of
      assumptions
  • Study design, analysis, & interpretation
    • E.g., possible mechanisms & causal
      relationships
  • Model building

Hypothesis Testing

Basic Assumptions in Hypothesis Testing

  • Null hypothesis:
    • That their is no difference between the groups
      • (Or zero effect of treatment, etc.)
  • Significance test:
    • “What is the probability of obtaining the
      sample data if the null hypothesis it true
    • E.g., p = .02 is the probability of finding
      the effect if the null is true

Basic Assumptions in Hypothesis Testing (cont.)

  • But the null is rarely—if ever—true
    • There is likely some effect
  • And the “noise” of most significance tests is reduced by larger samples
    • This is related to the idea of regression to the mean
      • And larger samples having smaller standard errors of the mean (SEMs)
    • The population distribution can be accurately measured with little error
  • Therefore, with a large enough sample, even small differences can be detected
    • “[T]he p-value is a measure of sample size” –Andy Gelman

Basic Assumptions in Hypothesis Testing (cont.)

  • That’s not bad or wrong
    • Or good and right
  • It’s simply the result of decisions made about how to make decisions
  • It’s the nature of the tools we use
    • And thus simply informs how we should use them.

Summary

Assumptions & Robustness

  • It’s important to understand what assumptions are made
    • And which matter & how
  • Generally, richer stats are less robust
    • Descriptives are inherently robust
    • Non-parametric statistics are more robust
      than parametric ones

Representativeness

  • Regression to the mean
    • “Convergence to population values”
    • For the mean, but also other parameters (SD, skew, etc.)
    • Not to be confused with Central Limit Theorem (CLT)
  • Standard error of the mean
    • Like a SD for the sample mean
    • Asymptotically robust

Sample Independence

  • Matters greatly
    • Can increase both false positives and false negatives
    • Can not only affect studies
      • But also entire lines/areas of research
  • Multicollinearity
  • Hierarchical / Multilevel / Mixed Models

Normality

  • Few—if any—distributions of real data are normal
    • And most stats are robust to violations
      • But multimodality & non-independence are dangerous
    • Especially across independent samplings (via the CLT)
  • Outliers, though, should not be ignored

Signal-to-Noise Ratio

  • The foundation of most inferential statistics
    • Inherently assumes—categorizes—some information as one or more “signals” & the rest as “noise”
    • Thus, it’s important to ensure this categorization is well done
  • It’s also as much part of study design as it is of study analysis
    • Indeed, along with theory, it’s a main driver of design

Hypothesis Testing

  • p-Value as
    • Chance to find effects assuming the null is true
    • Not just a measure of signal vs. noise
      • It’s also a measure of sample size

   The End