The Normal & Chi² Distribution

Overview

Normal distribution
- What it is
- Why it’s important
\(\chi^2\) distribution
- What it is
- Why it’s important

The Normal Distribution

Formula for a normal distribution:

\[f(x, \mu, \sigma) = \frac{1}{\sigma\sqrt{2\pi}}e^{\frac{-(x - \mu)^2}{2\sigma^2}}\]

\(x\) is some random variable
\(\mu\) is the mean
\(\sigma\) is the standard deviation
(\(\pi\) & \(e\) are those constants)

There, now you can say you learned this

Characteristics of the Normal Distribution

Normal Curve

Characteristics of the Normal Distribution (cont.)

Most importantly:
- It is well-understood
- It is only a function of the mean & standard deviation
- It roughly corresponds to many real-world distributions
The mean, median and mode are all equal
The total area under the curve equals 1
It’s symmetric
The curve approaches—
but never touches—the x-axis

Q-Q Plots

“Quantile-quantile” plots
- Compare the position in the distribution of a given quantity against the position of the same quantity in an other (e.g., normal) distribution
More simply, how & where two distributions deviate from each other
Frequently used to test if & how a sample deviates from normality
- Or how residuals deviate from normality
They’re easily created in SPSS or R
- They can be done—less easily—in Excel, etc
And a few examples may help understand them…

Normally-Distributed

Short Tails (Looks like an S )

Long Tails

Long Right Tail (Positive Skew)

Long Left Tail (Negative Skew)

More Figures, More Views on Skew

From StackExchange

Q-Q Plots (cont.)

Can put a “confidence envelope” around data in Q-Q plots:

Alternatives to Q-Q Plots

Can also/instead formally test normality of sample data with, e.g.:
- Kolmogorov-Smirnov Test (In SPSS & R)
- Shapiro-Wilk Test (In SPSS & R)
- Anderson-Darling Test (In R)
S-W and A-D are better than K-S, but all are strongly affected by sample size
- Under-powered with small N
  - Can’t detect non-normality when we need to
- Over-powered with large N
  - Overly sensitive to deviations when we don’t need to know
  - (I.e., have enough data to approximate population distribution without assuming normality)

The \(\chi^2\) Distribution

Background

Invented by Karl Pearson (in an abstruse 1900 article)
Originally to test “goodness of fit”
- How well a set of data fit a theoretical distribution
- Or if two sets of data follow the same distribution
Technically, \(\chi^2\) is a type of Gamma distribution created by the distribution of sums of the squares of a set of standard normal random variables
- A lot like we get when computing ordinary least squares for t-tests & ANOVAs
- And is closely related to the t and F distributions

Characteristics

The distribution’s shape, location, etc. are all determined by the degrees of freedom
- The mean = df
- The variance = 2df
- The maximum value for the y-axis = df – 2
  (when dfs >1)
As the degrees of freedom increase:
- The \(\chi^2\) curve approaches a normal distribution
- The curve becomes more symmetrical
It has no negative values
- Since it is based on squared values
- Making it good to test variances

Characteristics (cont.)

Uses of the \(\chi^2\)

Because it only depends on df
- And resembles a normal distribution
It is useful for testing if data follow a normal distribution
- Or often if the total set of deviations from normality
  - (Or any set of expected values)
- Are greater than expected
It can do this for discrete values—like counts
- t and F distributions technically can’t do this

Computing \(\chi^2\)

Formula for \(\chi^2\) value:

\[ \chi^2 = \sum{\frac{(Observed - Expected)^2}{Expected}}\]

So:
1. Compute the differences between a data’s actual value from it’s expected value
2. Square all of those differences & divide by the expected value
  • Kinda like computing the odds
3. Sum up those square differences “odds” for each group
4. Check that summed value against a \(\chi^2\) distribution
  • Where dfs = (N_rows – 1) \(\times\) (N_columns – 1)
5. If the summed value is really far from the center of the distribution
  • Then those actual-expected differences are significant

Example of Using a \(\chi^2\) (cont.)

Consider these opioid use disorder data:

Ethnicity	Diagnosed	Not Diagnosed	Total
Latin	72	128	200

Presenting that a little differently:

Value Type	Diagnosed	Not Diagnosed	Total
Observed	72	128	200
Expected	100	100	200

Example of Using a \(\chi^2\) (cont.)

Taking the difference between observed & expected
Squaring those differences & dividing by expected
Add up those values

Value Type	Diagnosed	Not Diagnosed
Observed	72	128
Expected	100	100
Observed - Expected	-28	28

\(28^2 = 784\)
\(\frac{784}{100} = 7.84\)
\(7.84 + 7.84 = 15.68\)
df = \((2 - 1)\times(2 - 1) = 1\times1 = 1\)

Example of Using a \(\chi^2\) (cont.)

The critical \(\chi^2\) value
- For 1 df
- And α = .05:

qchisq(df = 1, p = .05, lower.tail = FALSE)

## [1] 3.841459

Which is smaller than our 15.68
So our observed values are significantly different than our expected ones

Uses of the \(\chi^2\) (cont.)

The \(\chi^2\) distribution has many uses, including:
1. Estimating of parameters of a population of an unknown distribution
2. Checking the relationships between categorical variables
3. Checking independence of two criteria of classification of multiple qualitative variables
4. Testing deviations of differences between expected and observed frequencies
5. Conducting goodness of fit tests

The End