Major Principles & Practices in Biostatistics

Overview

Variability & Randomness
Levels of Measurement
Descriptive & Inferential Statistics
Sources of Variance and the Signal-to-Noise Ratio
Designing and Answering Questions
Hypothesis Testing

Variability & Randomness

Variance

Variance = Information
The extent to which things differ is the extent to which things require understanding
Much of statistics centers on investigating sources of variance
- Often through detecting shared variance between things
- And measuring the magnitude of that shared variance
  - Relative to unshared variance

Randomness

Randomness = Unpredictability
- I.e., variation due to chance alone
- Pure randomness cannot be predicted
Random influences on data are assumed to follow a normal distribution
- With a mean of zero
  - I.e., to eventually cancel out
Enables generalization from samples to populations
- As long as the sample was truly drawn at random

Levels of Measurement

Methods of Scaling

Scaling per se is simply:
- “representing quantities of attributes numerically”
- Methods of scaling not so straight-forward. . . .

Methods of Scaling (cont.)

Scaling involves applying a set of rules to measurements
- Each “level” of measurement follows different rules for operationalizing variables
- These, in turn, affect the permissible analyses
- Stevens’ seminal work in the 1950s established framework still in use

Summary of Common Measurement Scales

Scale	Basic Operation	Permissible Transformations	Permissible Statistics
Nominal	Equality vs. Inequality	Any one-to-one	Counts, Frequencies, Mode
Ordinal	Greater than vs. Less than	Monotonically Increasing	Median, Percents, Order Statistics, Kendall’s τ
Interval	Equality of Intervals	Difference Scores	General Linear x′ = bx + a
Ratio	Equality of Ratios	Multiplicative x′ = bx	Geometric Mean (dampens outliers)

Nominal

Pretty straight-forward
Any non-duplicative value can represent any category
- Can be numeric
  - Just be careful how a stat program treats it
  - Using 0 for one category can be useful
    - And 1 for another

Nominal (cont.)

Permissible statistics are limited
- Even in operational views of them
But don’t discount counts
- Ostensible & strongly valid
- Is in fact a ratio scale

Classiﬁcation / Categorization

Even with nominal scales, accuracy of categorization matters
As, of course, does the utility of the measure

Ordinal

Dichotomous variables
- I.e., presence/absence of a trait
Likert-scales items often subsumed here
- But asking participants to rank can be a useful alternative

Interval

Assumes equal intervals between levels
Ratios of absolute interval values are not meaningful
- Since ratios change with transformations
  - E.g., 100°F : 50°F vs. 100°C : 50°C
- But ratios of difference scores within a given
  transformation are meaningful

Interval (cont.)

Often as precise as one can achieve for most non-ostensible measurements
- Since it can be difficult to measure precisely enough to ensure total absence of a trait
- Or even define what its absence would mean
  - E.g., Satisfaction of care, level of pain, motoric development, bone density

Ratio

Most restrictive in transformations
- And thus most permissive in analyses & mathematical operations
  - Including the four, basic algebraic operations:
    - Add, subtract, multiply, & divide

Ratio (cont.)

Allows computation of geometric mean:
\[ x_{G} = (x_{1}x_{2}x_{3} ... x_{n})^{\frac{1}{n}}\]
I.e., multiply the item values, then take the nth root of them
- Where n = number of items
Useful for percents
- And as a way to account for outliers

Do These Levels Matter?

Stevens and other “representationalist” argue they sure do
- Scaling is an inherent trait of that measure
- Using operations available to a “higher” level assumes the traits of that level
  - And thus that your scaling is inaccurate
  - E.g., computing mean of “ordinal” means it’s not actually ordinal, so don’t pretend it is

Do These Levels Matter? (cont.)

Others argue that scaling can instead be considered a theory
- We assume relationship between the ostensible & non-ostensible traits
  - Then use conventions established for that state to guide analyses
  - Essentially an “operationalist” view of measures

Do These Levels Matter? (cont.)

The operational view assumes all measurement contains error:
- Observed = Actual + error

\[(O = A + e)\]

And thus that true ratio measures at least are rare

So, Do They Really Matter or What?

Arguably, it depends on how confident you are in your scaling
- More valid & precise relationships between an ostensible scaling & the underlying non-ostensible trait suggest stronger adherence to the rules of that scale
- Typically used to justify “higher” levels

So They Kinda Matter

In practice:
- Nominal requires only good classification
- Ordinal per se tends to be avoided
- Interval is sought & used most
- Ratio usually not worth trying for

“Lumpers” vs. “Splitters”

“Reducing” data to a lower scaling level is conservative
- But very rarely justified in practice
I believe best is to use the highest defensible level
- Data are expensive
  - It’s arguably ethical—and of course simply sensible—to use all of the information available

Descriptive & Inferential Statistics

Descriptives vs. Inferentials

Descriptives make no assumptions about the population from which the sample was drawn
- They simply describe the sample of data
  - Number, percents, ratios/odds
- Can also describe the distribution of the sample
  - Central tendency (mode, media, mean)
  - Dispersion (standard deviation, skewness, kurtosis)

Descriptives vs. Inferentials (cont.)

Inferentials make assumptions—inferences—about the nature of the entire population of data
The assumptions made can vary
- Aren’t always the same
- And sometimes the assumptions can be tested
Making assumptions allows us to conduct hypothesis tests
- Hypothesis testing doesn’t define inferential stats
- But is the most common reason to make the assumptions

Review of Descriptive Statistics

Not to be underestimated
- In addition to being informative, they are inherently robust
Robust statistics are tolerant of violations of assumptions/inferences made about the population
- And since descriptives don’t make any assumptions about it…
N.b., however, that the line between descriptive & inferential is blurry
- And inferential tests can be made on descriptives
  - E.g., if the counts/frequencies of occurrence are the same between two groups

Review of Descriptive Statistics (cont.)

Central tendency
- Yes, where the “center” of the data “tends” to be
- “Center,” of course, can be differentially defined
- Mode: the most common value; very robust to outliers
- Median: the value with the same number of other values on either side
  - Often used instead of mean when there are many outliers
- Mean: average value; least robust to outliers

Review of Descriptive Statistics (cont.)

Dispersion
- How spread out the data are
- Standard deviation
  - “On average, how far a given score is from the mean”
  - Equivalent for the median is the median absolute deviation (MAD)
    - “The median distance of scores from the median”
- Standard deviation is also related to variance (discussed a bit later)

Review of Descriptive Statistics (cont.)

Dispersion (cont.)
- Skew & kurtosis
  - Skew: if one of the distribution tails is longer than the other
  - Kurtosis: how spread out the data are
    - “Leptokurtic”: A very tight distribution; small SD
    - “Platykurtic”: A very spread-out distribution; large SD
- Skew is more problematic than kurtosis
  - Bootstrapping can largely handle this

Assumptions in Inferential Statistics

Three general types of assumptions:
1. That the sample represents the population
2. Each data point (“datum”) is independent of the others
3. That the population’s data are normally distributed
There are more/other assumptions that can be made, e.g.:
- Other distribution shapes
- Nature of any missing data
- Whether data values are continuous or discrete

Assumptions in Inferential Statistics (cont.)

For hypothesis testing, some assumptions matter more than others
- I.e., hypothesis tests tend to be robust against some violations of our assumptions
- But not others
Understanding assumptions & their effects can
help interpret results
Next:
- Robustness of common assumptions for
  “ordinary least squares” (t-tests, ANOVAs)

Assumptions in Inferential Statistics:
Representativeness

That the sample represents the population
- Robust with larger samples (also called “asymptotically robust”)
This is a manifestation of the “regression to the mean”
- I.e., that sample values tend to resemble population values when:
  - The sample size gets bigger
  - Repeated samples are drawn
- “Regression to the mean” would probably be better called
  “Convergence to the population mean”

Assumptions in Inferential Statistics:
Representativeness (cont.)

Standard error of the mean (SEM, or just “standard error,” SE)
- A measure for how well the sample mean represents the population mean
- Like standard deviation for the sample mean
  - I.e., the population mean should have a ~68% chance of being within 1 SEM of the sample mean
- The SEM gets ___________ as the sample size gets larger

The SEM is an inferential statistic
- But is robust to violations of normality—even for relatively non-normal population distributions

Assumptions in Inferential Statistics:
Representativeness (cont.)

In part because the SEM tends to be symmetrical

I.e., the population mean has the same chance of being greater than or less than the sample mean
- As long as participants/samples are independent & unbiased (and people can possible be resampled)

And if sample means tend to be symmetrically distributed,
- Then the distribution of sample means tends to be normally distributed
This, my friends, is the Central Limit Theorem

Assumptions in Inferential Statistics:
Sample Independence

That each participant is independent of the others
- Not robust! This assumption matters!
- If one participant’s values are affect by other participants, this can introduce several types of bias
  - Can create false positives and/or false negatives
    - I.e., increase Type 1 and/or Type 2 errors
- Can sometimes be addressed by properly “nesting” participants
  - E.g., patients in units, units in hospitals, etc.

Assumptions in Inferential Statistics:
Sample Independence (cont.)

That each data point is independent of the others (cont.)
- Related assumption is that terms in a model are unrelated
  - E.g., that the independent variables (IVs) in an ANOVA are unrelated
    - And unrelated to the “error” term
  - Can manifest as multicollinearity
    - I.e., that 2+ IVs are highly correlated with each other
  - OLS tests are generally robust against multicollinearity
    - Unless it’s extreme (e.g., r > .8)

Assumptions in Inferential Statistics:
Normality

That the population’s data are normally distributed
- Robust against some deviations from normality
Robust against kurtosis
- Rarely actually matters
Moderately/asymptotically robust against skew
- Especially from outliers
Not robust against “multimodality”
- I.e., having more than one “hump” in the data

Assumptions in Inferential Statistics:
Normality (cont.)

Outliers have an out-sized effect on results
- I.e., descriptions and inferences made are based more on them than on other data
  - Kinda like voters in Wyoming vs. New York
- Researchers should somehow address outliers
  - Often simply removing them (e.g., trimmed means & Windsorized variance)
  - Better, though, is investigating their influence (and/or bootstrapping)
Their effect is lessened as sample size increases

Assumptions in Inferential Statistics:
Normality (cont.)

Mulitmodal data
- Can indicate two subsamples
  - I.e., that sample should be split, or “stratified”
  - Or look at “localized” measures that focus on only part of the range
- Must be addressed somehow since measures of central tendency are inaccurate

The Signal-to-Noise Ratio /
Variance & Covariance

Signal-to-Noise Ratio

Generally, information in a sample of data is placed into
two categories:
- “Signal,” e.g.:
  - Difference between group means
  - Magnitude of change over time
  - Amount two variables co-vary/co-relate
- “Noise”, e.g.,
  - Difference within a group
  - “Error”—anything not directly measured

Signal-to-Noise Ratio (cont.)

Many statistics & tests are these ratios
- And investigating multiple signals & even multiple sources of noise
And if there is more signal than noise,
- We can then test if there is enough of a signal to “matter”
- I.e., be “significant”
E.g., the F-test in an ANOVA
- A ratio of “mean square variance” between groups/levels vs. “mean square error” within each group
- If F > 1, then use sample size to determine if the value is big enough to be significant

Signal-to-Noise Ratio (cont.)

Source	Sum of Squares	df	Mean Square	F	p	Partial η²
*Information* Intervention	4.991	1	4.991	5.077	.025	0.028
*Telesimulation* Intervention	0.349	1	0.349	0.355	.552	0.022
Error	172.061	175	0.983

\(\frac{Signal}{Noise}=\frac{Mean\ Square\ Between}{Mean\ Square\ Error}\)
E.g., for the Information Intervention: \(\frac{4.991}{0.983} = 5.077\)

Designing and Answering Questions

Statistics as a Way of Thinking

Arguably a fundamental way quantitative research differs from qualitative
In part—yes—it’s knowing the sorts of questions to ask
- As Fisher said, “[t]o call in the statistician after the experiment is done may be no more than asking [them] to perform a post-mortem examination: [they] may be able to say what the experiment died of.”

But hopefully it’s more centered on an understanding of science and the philosophical Zeitgeist within which it operates
- Stats embodies science’s parsimony, objectivity,
  systematicity, precision, & probabilistic nature

Theory and Observation

Understanding the nature of scientific theory
- And it’s (preeminent) relationship to research design & interpretation
Deciding what & how to measure
- Levels of measurement, bias, roles of
  assumptions
Study design, analysis, & interpretation
- E.g., possible mechanisms & causal
  relationships
Model building

Hypothesis Testing

Basic Assumptions in Hypothesis Testing

Null hypothesis:
- That their is no difference between the groups
  - (Or zero effect of treatment, etc.)
Significance test:
- “What is the probability of obtaining the
  sample data if the null hypothesis it true”
- E.g., p = .02 is the probability of finding
  the effect if the null is true

Basic Assumptions in Hypothesis Testing (cont.)

But the null is rarely—if ever—true
- There is likely some effect
And the “noise” of most significance tests is reduced by larger samples
- This is related to the idea of regression to the mean
  - And larger samples having smaller standard errors of the mean (SEMs)
- The population distribution can be accurately measured with little error
Therefore, with a large enough sample, even small differences can be detected
- “[T]he p-value is a measure of sample size” –Andy Gelman

Basic Assumptions in Hypothesis Testing (cont.)

That’s not bad or wrong
- Or good and right
It’s simply the result of decisions made about how to make decisions
It’s the nature of the tools we use
- And thus simply informs how we should use them.

Summary

Assumptions & Robustness

It’s important to understand what assumptions are made
- And which matter & how
Generally, richer stats are less robust
- Descriptives are inherently robust
- Non-parametric statistics are more robust
  than parametric ones

Representativeness

Regression to the mean
- “Convergence to population values”
- For the mean, but also other parameters (SD, skew, etc.)
- Not to be confused with Central Limit Theorem (CLT)
Standard error of the mean
- Like a SD for the sample mean
- Asymptotically robust

Sample Independence

Matters greatly
- Can increase both false positives and false negatives
- Can not only affect studies
  - But also entire lines/areas of research
Multicollinearity
Hierarchical / Multilevel / Mixed Models

Normality

Few—if any—distributions of real data are normal
- And most stats are robust to violations
  - But multimodality & non-independence are dangerous
- Especially across independent samplings (via the CLT)
Outliers, though, should not be ignored

Signal-to-Noise Ratio

The foundation of most inferential statistics
- Inherently assumes—categorizes—some information as one or more “signals” & the rest as “noise”
- Thus, it’s important to ensure this categorization is well done
It’s also as much part of study design as it is of study analysis
- Indeed, along with theory, it’s a main driver of design

Hypothesis Testing

p-Value as
- Chance to find effects assuming the null is true
- Not just a measure of signal vs. noise
  - It’s also a measure of sample size

The End

Overview

Variability & Randomness

Variance

Randomness

Levels of Measurement

Methods of Scaling

Methods of Scaling (cont.)

Summary of Common Measurement Scales

Nominal

Nominal (cont.)

Classiﬁcation / Categorization

Ordinal

Interval

Interval (cont.)

Ratio

Ratio (cont.)

Do These Levels Matter?

Do These Levels Matter? (cont.)

Do These Levels Matter? (cont.)

So, Do They Really Matter or What?

So They Kinda Matter

“Lumpers” vs. “Splitters”

Descriptive & Inferential Statistics

Descriptives vs. Inferentials

Descriptives vs. Inferentials (cont.)

Review of Descriptive Statistics

Review of Descriptive Statistics (cont.)

Review of Descriptive Statistics (cont.)

Review of Descriptive Statistics (cont.)

Assumptions in Inferential Statistics

Assumptions in Inferential Statistics (cont.)

Assumptions in Inferential Statistics:Representativeness

Assumptions in Inferential Statistics:Representativeness (cont.)

Assumptions in Inferential Statistics:Representativeness (cont.)

Assumptions in Inferential Statistics:Sample Independence

Assumptions in Inferential Statistics:Sample Independence (cont.)

Assumptions in Inferential Statistics:Normality

Assumptions in Inferential Statistics:Normality (cont.)

Assumptions in Inferential Statistics:Normality (cont.)

The Signal-to-Noise Ratio /Variance & Covariance

Signal-to-Noise Ratio

Signal-to-Noise Ratio (cont.)

Signal-to-Noise Ratio (cont.)

Designing and Answering Questions

Statistics as a Way of Thinking

Theory and Observation

Hypothesis Testing

Basic Assumptions in Hypothesis Testing

Basic Assumptions in Hypothesis Testing (cont.)

Basic Assumptions in Hypothesis Testing (cont.)

Summary

Assumptions & Robustness

Representativeness

Sample Independence

Normality

Signal-to-Noise Ratio

Hypothesis Testing

Assumptions in Inferential Statistics:
Representativeness

Assumptions in Inferential Statistics:
Representativeness (cont.)

Assumptions in Inferential Statistics:
Representativeness (cont.)

Assumptions in Inferential Statistics:
Sample Independence

Assumptions in Inferential Statistics:
Sample Independence (cont.)

Assumptions in Inferential Statistics:
Normality

Assumptions in Inferential Statistics:
Normality (cont.)

Assumptions in Inferential Statistics:
Normality (cont.)

The Signal-to-Noise Ratio /
Variance & Covariance