Appendix C — Statistical Analysis Decision Trees and Guides

One of my goals for this curriculum is to empower you to be able to use models—especially generalized linear models—flexibly. To think of the sorts of research questions you want answered, how to operationalize those questions, and then design analyses around those questions. One simple (or simplistic) way to approach that is to think of what level of measurement your variables represent and then choose the analysis recommended for them.

That simpler approach is simply addressed through a statistical analysis decision tree or dedicated guide. This appendix presents that. Or rather, presents “those” since this appendix is—and may long remain—little more than a collection of links to others who have done this already. Even though I’ve already curated and culled from what’s out there, this appendix is still less of a venerable decision tree and more a brambly new-growth forest. I will nonetheless try to forge a path through these decision trees best I can.

Table B.2 presents a complementary reference by describing some common analyses and the types of outcome and predictor variables used in each.

One final note: As you might expect given the varied and ad hoc nature of naming in statistics, “decision tree” also denotes a type of analysis (actually an analytic strategy) that are beyond the pale of this curriculum.

C.1 References and Guides

The following are sources that discuss guidelines, etc. for which statistic to choose.

C.1.1 Correlations & Associations

Khamis (2008) clearly presents which measure of association/correlation to use with various types of data, along with some guides on interpreting the strengths of these measures. Their recommendations are summarized in this table, reproduced from their Summary section:

Table C.1: Types of Correlation Statistics
		Variable X
Variable Y	Nominal	Ordinal	Continuous
Nominal	\(\phi\) coefficient or Goodman & Kruskal’s \(\lambda\)	Rank biserial ¹	Point biserial
Ordinal	Rank biserial	Kendall’s \(\tau_{b}\) or Spearman’s \(\rho\)	Kendall’s \(\tau_{b}\) or Spearman’s \(\rho\)
Continuous	Point biserial	Kendall’s \(\tau_{b}\) or Spearman’s \(\rho\)	Pearson’s r or Spearman’s \(\rho\)

C.2 Simple Graphics

The following trees are simple files that organize the analyses one commonly uses to test straight-forward hypothesis tests between relatively small groups of variables, e.g., one outcome and one or two predictors. They all top out at ANOVAs, and thus effectively fill in the gap I left unfilled for what is covered before the model building focused in our curriculum.

Howell (2008) covers analyses from correlations to ANOVAs. The benefit of this tree is its simplicity; the deficit is it lack of specificity between parametric and non-parametric analyses.
Corston & Colman’s (2000) tree is also a simple “cheat sheet” file like Howell’s, but contains more information about distinguishing between parametric and non-parametric tests (via the level of measurement of the given variables).

C.3 Online Trees

These are website that let your choose the analysis by answering a series of questions. They tend to be more thorough than the simple graphics, but require a more involved process to get to the solution.

MicrOsiris’s decision tree allows one to step through questions to determine what analysis to conduct; it also provides a nice summary page that indicates which function to use to conduct a given analysis in SPSS, SAS, and their own freeware stat program, MicrOsiris.
NIST’s Decision Tree for Key Comparisons is more than just that. As the About page says, the tree “guides users through a series of hypothesis tests intended to help them in deciding upon an appropriate statistical model for their particular data.” One can first enter or upload (a .csv file) to the site and then see what their tree recommends as analyses for those data. Pretty cool, huh?
Statistical Test Flowchart doesn’t take as many steps as, e.g., MicrOsisis’s tree. It thus presents less specific results at the end, but it gives more of an explanation of what it recommends along with links to how to conduct the given analysis in R, SPSS, and Stata.

Weisburd and Britt (2007) give a good, further coverage of analyzing associations between nominal and ordinal data.↩︎