=if(A2 = 1, "Female", "Not Female")
6 Introduction to Excel
There are times when I still prefer to use a spreadsheet program instead of R (or SPSS). Sure, it’s easier for me in part because I’ve been using spreadsheets, well, pretty much since they existed1, but mostly because I like to look closely at the data, sometimes cell by cell, and to have that very precise and close-up control over it. I think you’ll appreciate why it’s such an enduring way to examine—and even analyze—your hard-earned data, especially after you master using the keyboard to navigate and sort data.
We will cover some of the most common functions and keyboard shortcuts. More keyboard shortcuts are given in Microsoft’s page dedicated to that. Their site also has an overview of formulas.
6.1 Overview
This activity is intended to review some of the main ways that you may use a spreadsheet program (viz., MS Excel) to manipulate, review, and even analyze data.
6.2 Filling in Series
- Enter the first few values of a series in a row or column
- Highlight those cells
- Move cursor to bottom right corner of the last cell
- The cursor should change shape (e.g., from a fat, white cursor to a skinny, black one)
- Hold down the (left) mouse button
- Drag the cursor right/down to fill in more values of that sequence
Excel (and LO Calc, Gnumeric, etc.) are pretty good at recognizing what the series should be. E.g., counting by even numbers, counting by tens.
6.3 Pasting Special & Transposing
- Data can be pasted with various levels of formatting retained or removed
- This is especially useful in two situations:
1. Converting formulas to raw numbers
- (After creating some range of cells that contains formulas), select the range of cells you’d like to convert from formulas to raw numbers
- Copy that range (
Control
+C
) - Move to the range of cells you’d like to paste it to
- You can paste right back into the same range if you’d just like to convert those formulas to values there
- Open the
Paste Special
dialogue either by typingControl
2 +Alt
3 +V
. Or:- Click on the
File
tab, and then the little down-pointing arrowhead under thePaste
button - Either choose the type of
Special
pasting you’d like to do, or clicking on thePaste Special
button
- Click on the
- Select to paste
Values
(or just typeV
)
2. Transposing Data
Transposing means to flip down from, e.g., going from top to bottom instead to be going from left to right. To do this:
- Select & copy the range of cells to transpose
- Move to where you’d like them transposed
- Access the
Paste Special
dialogue (e.g,Control
+Alt
+V
) - Type
T
to select transpose, or select that option from the dialogue box.
- More at, e.g., ablebits.com or extendoffice.com
6.5 Formulas
Formulas in Excel and Calc allow you to quickly conduct many. useful operations on your data. It’s even sometimes easier to use these here than to do themin SPSS, R, or an other stats program. Using formulas in a spreadsheet program helps you get up close and personal with your data before analyzing them in those stats programs, too.
I will cover a few that I use most often in this section. Among the additional resources to consider are:
- ExcelGPT (aka ExcelBrew), which uses AI to create formulas based on your description of the action(s) you’d like to do
6.5.1 General Steps for Entering Formulas (and Common Formulas to Use)
To begin to enter a formula in a cell:
- Type
=
within a cell to enter a formula Alt
,M
: Go to formula tab- Common Formulas:
Function | Excel Formula |
---|---|
Sum of range | =sum( cell range) |
Minimum value of range | =min( cell range) |
Maximum value of range | =max( cell range) |
Number of cells with numbers in a range | =count( cell range) |
Number of cells with words in a range | =counta( cell range) |
Mean of range | =average( cell range) |
Standard deviation of range4 | =stdev.s( cell range) |
Generate a random number5 from 0 – 1 | =rand() |
Generate a random whole number between a range | =randbetween( lower value, upper value) |
6.5.2 if
The if
function evaluates a cell and returns different values if the contents of that cell do or do not match some criterion.
For example, I could have gender coded in one cell as a 1
if that person identifies as female or a 0
if they identify otherwise6. I may then want to have an other cell that actually spells out what that 1
and 0
mean in case I forget. I could use if
to do this.
The general format for an if
function is:
=if(
Some statement,
What to output it the statement is true,
What to output it the statement is false)
To continue with that example, let’s imagine that the cell with the dummy-code for female
is in cell A2
and I want to put the phrase Female
or Not Female
in cell B2
, like this in Excel:
Row/Column | A | B |
---|---|---|
1 | Dummy Variable | Category Label |
2 | 1 |
Female |
To do this, I would type the following into cell B2
, i.e., the cell where I want the result to go:
To break down the parts of an if
statement doing this:
=if(
- This first part starts with an equal sign (
=
) letting Excel know that you’ll be entering a function - The
if(
lets it know what the function is you’ll be using
= 1, A2
- This tells Excel where to look to evaluate an argument (we’re telling it to look in cell
A2
) - It then tells Excel what the formula is to evaluate. Here, we’re asking whether the value in cell
A2
is equal to 1 or not (= 1
)
"Female",
- This is the value we’re asking Excel to return if indeed the value in cell
A2
equals 1. IfA2
equals one, we want Excel to print out the wordFemale
.- Since we’re asking Excel to print out a word, we have to put it in quotes.
- If we’d asked Excel to print out a number, we would not put it in quotes7.
"Not Female")
- This is what we want Excel to print out if our formula (
A2 = 1
) is incorrect, i.e., if the value in cellA2
is anything besides a1
.
If we had more cells in that first column—other gender identities for other participants, it could look like the following table—noting that I’ve added in a column to show what the function would look like right before what Excel would produce in that column:
Row/Column | A | B | |
---|---|---|---|
1 | Dummy Variable | Function Typed into Cell in Column B |
Excel Output in Column B |
2 | 1 |
=if(A2 = 1, "Female", "Not Female") |
Female |
3 | 0 |
=if(A3 = 1, "Female", "Not Female") |
Not Female |
4 | 3 |
=if(A4 = 1, "Female", "Not Female") |
Not Female |
5 | NA |
=if(A5 = 1, "Female", "Not Female") |
Not Female |
6 | =if(A6 = 1, "Female", "Not Female") |
Not Female |
- Note that cell
A6
is empty, i.e., row6
has an empty cell in columnA
. Excel interprets an empty cell as not meeting the evaluated condition (here thatA6 = 1
).
if
statements are very useful. They can evaluate several operations:
Operator | Meaning | Example | Result |
---|---|---|---|
= |
Equal to | =if(A2 = B2, "TRUE", "FALSE") |
Returns TRUE the value in A2 is the same as the value in B2 |
<> |
Not equal to | =if(A2 <> B2, "TRUE", "FALSE") |
Returns TRUE the value in A2 is different than the value in B2 |
> |
Greater than | =if(A2 > B2, "TRUE", "FALSE") |
Returns TRUE the value in A2 is greater than the value in B2 , e.g., if A2 = 2 and B2 = 1 |
>= |
Greater than or equal to | =if(A2 >= B2, "TRUE", "FALSE") |
Returns TRUE the value in A2 is greater than or equal to the value in B2 , e.g., if A2 = 2 and B2 = 1 or of B2 = 2 |
< |
Less than | =if(A2 < B2, "TRUE", "FALSE") |
Returns TRUE the value in A2 is less than the value in B2 , e.g., if A2 = 5 and B2 = 10 |
<= |
Less than or equal to | =if(A2 <= B2, "TRUE", "FALSE") |
Returns TRUE the value in A2 is less than the value in B2 , e.g., if A2 = 5 and B2 = 10 or if B2 = 5 |
In addition to recoding, you can use it to test if cells are blank (=if(A2 = "", "Blank", "Not Blank")
), compute different formulas based on different values (e.g., return the absolute value if a cell is negative: =if(A2 < 0, abs(A2), A2)
), etc.
Nesting if
Statements
if
statements can be nested. For example, I use the following formula to compute letter grades from a percent grade given in cell A1
. I can just drop this formula into, say, cell B1
and it will automatically give me the letter grade using CUNY’s conversion standards8:
=IF(A1>=97.5,"A+", IF(A1>=92.5,"A", IF(A1>=90,"A-", IF(A1>=87.5,"B+", IF(A1>=82.5,"B", IF(A1>=80,"B-", IF(A1>=77.5,"C+", IF(A1>=70,"C", IF(A1>=60,"D", IF(A1<60,"F"))))))))))
Note that there is an other Excel function, ifs
that makes the syntax for nesting if
statements a bit cleaner, but I personally prefer seeing it all spelled out—even if it means having a bunch of parentheses at the end.
The ifs
statement for that same coding of percents into letter grades is:
=IFS(A1>=97.5,"A+", A1>=92.5,"A", A1>=90,"A-", A1>=87.5,"B+", A1>=82.5,"B", A1>=80,"B-", A1>=77.5,"C+", A1>=70,"C", A1>=60,"D", A1<60,"F")
6.5.3 vlookup
I love vlookup
. It’s such a powerful way to recode variables based on some criterion. For example, I could recode all males
to 0
and females
to 1
in just a few steps.
vlookup
is a v
ertical lookup that you use to fill in values down a column. You can use hlookup
to look up values to a row (not a column) of data with values also looked up in rows of data. (Apparently xlookup
is a new Excel command that knows which way the data are being read and looked up, so can be used instead of both vlookup
and hlookup
, but I’ve not gotten it to work.)
- Formula:
=vlookup(
cell to look up,
range of cells to find replacement value,
column in range for value to return,
match mode)
- Match modes:
Value to Enter | Description |
---|---|
0 |
Exact match. If none found, returns #N/A . This is the default |
-1 |
Exact match. If none found, return the next smaller item |
1 |
Exact match. If none found, return the next larger item |
2 |
A wildcard match where * , ? , and ~ are “wildcards”, MS’s pathetic attempt at regular expressions |
- To “lock” part of a cell reference, add
$
right before it - E.g.:
- To keep the reference for replacement values “locked” to cells
A1
throughB12
(written in the formula asA1:B12
) when filling in cells below with that formula, - Add a
$
before the1
and the12
, making itA$1:B$12
- To keep the reference for replacement values “locked” to cells
Index
and Match
You can get similar results by instead using the index
and match
commands. Since I prefer vlookup
, for now, I will simply link to this ExcelJet page that covers them well.
6.6 Pivot Tables & Charts
Pivot tables are a great way to quickly generate descriptive statistics for categories. They are also flexible so that you can look at subgroups or “cross tables” that show, e.g., the stats for two variables that are crossed with each other (like looking at, say, the hip-waist ratios of genders crossed with ethnicities/races).
6.6.1 Inserting/Creating a Pivot Table or Chart
Table:
- Insert tables using:
Alt
,N
,V
- GUI:
- In the
Tables
group, click on theInsert
tab - Click on
PivotTable
- Click
OK
- In the
Chart:
- Insert charts using:
Alt
,N
,S
,C
- GUI:
- In the
Tables
group, click on theInsert
tab - Click on
PivotChart
- Click
OK
- In the
6.7 Basic Statistics
- Most of the statistical analyses are within the
Data
tab under theData Analysis
button
6.7.1 Correlations
Function | Formula |
---|---|
Return a correlation matrix | =correl( cell range 1, cell range 2) |
Creating correlations (and many other descriptive & inferential stats) is also accessible via a GUI in the Analysis ToolPak
described under ANOVAs, below.
6.7.2 t-Test
This formula returns the p-value for the t-test. It does not return the actual t-score.
Formula:
=t.test(
group 1 column range,
group 1 column range,
tails,
type)
- Tails:
Value to Enter | Description |
---|---|
1 |
One-tailed testI.e., that Group 1 values are larger than Group 2 values |
2 |
Two-tailed testI.e., that Group 1’s values are either larger or smaller than Group 2’s values |
- Type:
Value to Enter | Description |
---|---|
1 |
Paired t-testI.e., that the values in a given row are from the same participant.E.g., Group 1 is a person’s pretest score and Group 2 is that same person’s posttest score |
2 |
Unpaired t-test assuming homoscedasticityI.e., that the variance in the populations from which Group 1 is sample from is the same as the variance in the population from which Group 2 is sampled |
3 |
Unpaired t-test assuming heteroscedasticityI.e., that the variances in the populations from which Groups 1 and 2 are sampled are different |
N.b., OLS tests (e.g., t-tests & ANOVAs) are rather robust to heteroscedasticity, so choosing 2
for unpaired t-tests is generally fine. If variances are very different, then standardizing the variables usually suffices to address the issue.
Steps to Conduct a t-Test on the China Posttest Data
As an example for computing the p-value for a t-test in Excel, let us use the posttest data from the study of elementary students in China.
- Open the
CFL_Posttest_Data.csv
file in Excel - Sort all of the data by
Population
. To do this:- Type
Control + Home
to go to the top-left cell Shift + Control + End
to select all of the data- Type
Alt
(Option
on a Mac), thenD
, and thenS
to open theSort
dialogue box (it’s not easy to see how to get the combination, but it’s one I use often enough to have just memorized, If it helps to know, that’sD
for theData
menu and thenS
forSort
. - Sort by
Population
inA to Z
order
- Type
- Now go to somewhere outside of the data to create the formula. Me, I created a new sheet and went to that instead of doing things on the sheet with the data—don’t want to accidentally overwrite data and not realize it. Believe me.
- In essence, the
t.test
formula has you type=t.test(
9 and then to follow that with a selection of the first set of data and then follow that with a selection of the second set of data. Plus some other, almost-random stuff at the end of the command.- You can do all of this by typing
=t.test(
into a cell and then going back to the sheet with all of the data left-mouse-clicking in the first cell withtoca.pro
data (cellM2
), and then holding the mouse button down while you scroll down until you see in the Population column that it’s changed from Migrant to Non-Migrant to select the first group of data. Then repeating that for selecting the non-Migrant data in below that in the same column. - Or you can paste this into a cell:
=T.TEST(CFL_Posttest_Data!M2:M849,CFL_Posttest_Data!M850:M1130,2,2)
What that does is say:- The first set of data are in the
CFL_Posttest_Data
sheet in cellsM2
throughM849
; these are the Migrant students’ raw prosociality scores - The second set of data are also in the
CFL_Posttest_Data
sheet, these in cellsM850
down through cellM1130
; these are the Non-Migrant students’ raw prosociality scores - That first
2
after indicating the second set of data tells Excel that it’s a two-tailed test (meaning we’re testing whether the Migrant scores are higher or lower than the non-Migrant scores; a one-tailed test would test only if they were higher but not lower; one-tailed tests are more powerful but only answer that one question—as is common in stats, there’s a trade off between power and precision) - That second
2
tells Excel it’s an unpaired t-test (meaning the people in one group are not the same people as in the second group; they could be the same people, e.g., if we were following the same people in a pre-post design). It also tells Excel to assume that the variance of prosociality scores is the same in both groups; again, this is an assumption that’s (a) usually wrong to some degree and (b) O.K. to make as long as they’re not that different.
- The first set of data are in the
- After doing all this mountain of work, you should be gifted with a rather anticlimactic result of a single number appearing in that cell. That modest number should be
0.196848
(depending on how many digits you see). That is the p-value for the t-test; if this number were less than, say, .05, we would say that there is a significant difference between the two groups. However, it is not, so we would not say there is a significant difference.
- You can do all of this by typing
6.7.3 ANOVAs
Installing the Anlaysis
Add-in
The functionality to conduct most of the stats that Excel can are not loaded by default. (These are, however, available by default in LO Calc). Fortunately, they are (at least so far) available through an easy installation. To do this:
- Click on
File
- In the menu that opens, click on
Options
thenAdd-ins
- Select the
Analysis ToolPak
option near the top - At bottom of dialogue box that opens, under
Manage
, selectExel Add-ins
and clickGo
- Select
Analysis ToolPak
(and whichever other ones you want)- (
VBA
is for using MS’s visual basic functionality—a holdover from when Windows & Office were still figuring themselves out)
- (
- An
Anlaysis
tab will now appear in under theData
tab
Conducting an ANOVA
E.g., a “Single Factor” ANOVA, which is otherwise known (confusingly) as a one-way ANOVA. This is simply an ANOVA that has only one predictor (or independent variable, IV) and one outcomes variable (or dependent variable, DV).
- Under the
Data
tab and in theAnalysis
group, click on theData Analysis
button - In the dialogue box that opens, click on
Anova: Single factor
- Click on the icon next to the
Input Range
field to minimize the next dialogue box that opens - Select the range of cells (here, two columns, one for the IV & one for the DV), and then click on the icon to maximize that dialogue box
- Choose whether data are grouped by columns of rows—most likley by columns, with one column for each variable
- Choose where the results will be posted; I tend to choose
New Workseet Ply:
which I give an apposite name to, e.g., “ANOVA Source Table” - Click
OK
I miss Quattro Pro nearly as much as Word Perfect. Then again, I also miss making mixed tapes on my boom box and decorating the cassette cases, so take it for what it is.↩︎
This is usually the
Command
key on a Mac.↩︎Alt
usually translates toOption
on Macs↩︎The
.s
at the end of it denotes that it is to generate the standard deviation of a sample—not the population. (That’s=stdev.p
although you’ll likely never use it: We rarely have all of the data for a population. That’s for qualitative research to say that their participants are that entire population.)↩︎Remember that computers can’t do random things. (So. there’s no need for a Voight-Kampff test; all you need is to detect non-randomness.) So, the values generated by Excel cannot be considered random for, e.g., randomizing within a study. True randomization can be achieved best through old-fashioned ways: throwing dice, pulling pieces of paper from a bag, flipping coins, etc.↩︎
This is “dummy coding” gender into a yes/no variable about whether the person identifies as female. If the person identified as female, we code that as a
1
; any other response (except missing data) we code as a0
; missing data are coded as missing data. We could also have an other variable that dummy-codes whether they identify as male (there,1
for male and0
for anything else.). And yes, this allows us to have times when someone identifies as both female and male by letting that person have a1
for both variables. One of the advantages of dummy coding then is that it allows for multiple responses/categories.↩︎Unless we instead wanted Excel to treat the output not as a number but as a word, i.e., treating some number as a word and not as a number.↩︎
Notice that the
IF
statements are read by Excel from left to right, so, if a percent is not ≥97.5
then Excel goes to the nextIF
statement to see if it’s ≥92.5
, and if not then goes to the nextIF
, etc.↩︎Typing the left parenthesis let you then fill in the values with mouse selections and/or further typing↩︎