5  Introduction to Excel

There are times when I still prefer to use a spreadsheet program instead of R (or SPSS). Sure, it’s easier for me in part because I’ve been using spreadsheets, well, pretty much since they existed1, but mostly because I like to look closely at the data, sometimes cell by cell, and to have that very precise and close-up control over it. I think you’ll appreciate why it’s such an enduring way to examine—and even analyze—your hard-earned data, especially after you master using the keyboard to navigate and sort data.

We will cover some of the most common functions and keyboard shortcuts. More keyboard shortcuts are given in Microsoft’s page dedicated to that. Their site also has an overview of formulas.

5.1 Overview

This activity is intended to review some of the main ways that you may use a spreadsheet program (viz., MS Excel) to manipulate, review, and even analyze data.

5.2 Filling in Series

  1. Enter the first few values of a series in a row or column
  2. Highlight those cells
  3. Move cursor to bottom right corner of the last cell
    • The cursor should change shape (e.g., from a fat, white cursor to a skinny, black one)
  4. Hold down the (left) mouse button
  5. Drag the cursor right/down to fill in more values of that sequence

Excel (and LO Calc, Gnumeric, etc.) are pretty good at recognizing what the series should be. E.g., counting by even numbers, counting by tens.

5.3 Pasting Special & Transposing

  • Data can be pasted with various levels of formatting retained or removed
  • This is especially useful in two situations:

1. Converting formulas to raw numbers

  1. (After creating some range of cells that contains formulas), select the range of cells you’d like to convert from formulas to raw numbers
  2. Copy that range (Control + C)
  3. Move to the range of cells you’d like to paste it to
    • You can paste right back into the same range if you’d just like to convert those formulas to values there
  4. Open the Paste Special dialogue either by typing Control2 + Alt3 + V. Or:
    1. Click on the File tab, and then the little down-pointing arrowhead under the Paste button
    2. Either choose the type of Special pasting you’d like to do, or clicking on the Paste Special button
  5. Select to paste Values (or just type V)

2. Transposing Data

Transposing means to flip down from, e.g., going from top to bottom instead to be going from left to right. To do this:

  1. Select & copy the range of cells to transpose
  2. Move to where you’d like them transposed
  3. Access the Paste Special dialogue (e.g, Control + Alt + V)
  4. Type T to select transpose, or select that option from the dialogue box.

5.5 Formulas

Formulas in Excel and Calc allow you to quickly conduct many. useful operations on your data. It’s even sometimes easier to use these here than to do themin SPSS, R, or an other stats program. Using formulas in a spreadsheet program helps you get up close and personal with your data before analyzing them in those stats programs, too.

I will cover a few that I use most often in this section. Among the additional resources to consider are:

  • ExcelGPT (aka ExcelBrew), which uses AI to create formulas based on your description of the action(s) you’d like to do

5.5.1 General Steps for Entering Formulas (and Common Formulas to Use)

To begin to enter a formula in a cell:

  • Type = within a cell to enter a formula
  • Alt, M: Go to formula tab
  • Common Formulas:
Table 5.3: Common Formulas in Excel
Function Excel Formula
Sum of range =sum(cell range)
Minimum value of range =min(cell range)
Maximum value of range =max(cell range)
Number of cells with numbers in a range =count(cell range)
Number of cells with words in a range =counta(cell range)
Mean of range =average(cell range)
Standard deviation of range4 =stdev.s(cell range)
Generate a random number5 from 0 – 1 =rand()
Generate a random whole number between a range =randbetween(lower value,upper value)

5.5.2 if

The if function evaluates a cell and returns different values if the contents of that cell do or do not match some criterion.

For example, I could have gender coded in one cell as a 1 if that person identifies as female or a 0 if they identify otherwise6. I may then want to have an other cell that actually spells out what that 1 and 0 mean in case I forget. I could use if to do this.

The general format for an if function is:

=if(Some statement,What to output it the statement is true,What to output it the statement is false)

To continue with that example, let’s imagine that the cell with the dummy-code for female is in cell A2 and I want to put the phrase Female or Not Female in cell B2, like this in Excel:

Row/Column A B
1 Dummy Variable Category Label
2 1 Female

To do this, I would type the following into cell B2, i.e., the cell where I want the result to go:

=if(A2 = 1, "Female", "Not Female")

To break down the parts of an if statement doing this:

=if(
  • This first part starts with an equal sign (=) letting Excel know that you’ll be entering a function
  • The if( lets it know what the function is you’ll be using
A2 = 1,
  • This tells Excel where to look to evaluate an argument (we’re telling it to look in cell A2)
  • It then tells Excel what the formula is to evaluate. Here, we’re asking whether the value in cell A2 is equal to 1 or not (= 1)
"Female",
  • This is the value we’re asking Excel to return if indeed the value in cell A2 equals 1. If A2 equals one, we want Excel to print out the word Female.
    • Since we’re asking Excel to print out a word, we have to put it in quotes.
    • If we’d asked Excel to print out a number, we would not put it in quotes7.
"Not Female")
  • This is what we want Excel to print out if our formula (A2 = 1) is incorrect, i.e., if the value in cell A2 is anything besides a 1.

If we had more cells in that first column—other gender identities for other participants, it could look like the following table—noting that I’ve added in a column to show what the function would look like right before what Excel would produce in that column:

Row/Column A B
1 Dummy Variable Function Typed into Cell in Column B Excel Output in Column B
2 1 =if(A2 = 1, "Female", "Not Female") Female
3 0 =if(A3 = 1, "Female", "Not Female") Not Female
4 3 =if(A4 = 1, "Female", "Not Female") Not Female
5 NA =if(A5 = 1, "Female", "Not Female") Not Female
6 =if(A6 = 1, "Female", "Not Female") Not Female
  • Note that cell A6 is empty, i.e., row 6 has an empty cell in column A. Excel interprets an empty cell as not meeting the evaluated condition (here that A6 = 1).

if statements are very useful. They can evaluate several operations:

Operator Meaning Example Result
= Equal to =if(A2 = B2, "TRUE", "FALSE") Returns TRUE the value in A2 is the same as the value in B2
<> Not equal to =if(A2 <> B2, "TRUE", "FALSE") Returns TRUE the value in A2 is different than the value in B2
> Greater than =if(A2 > B2, "TRUE", "FALSE") Returns TRUE the value in A2 is greater than the value in B2, e.g., if A2 = 2 and B2 = 1
>= Greater than or equal to =if(A2 >= B2, "TRUE", "FALSE") Returns TRUE the value in A2 is greater than or equal to the value in B2, e.g., if A2 = 2 and B2 = 1 or of B2 = 2
< Less than =if(A2 < B2, "TRUE", "FALSE") Returns TRUE the value in A2 is less than the value in B2, e.g., if A2 = 5 and B2 = 10
<= Less than or equal to =if(A2 <= B2, "TRUE", "FALSE") Returns TRUE the value in A2 is less than the value in B2, e.g., if A2 = 5 and B2 = 10 or if B2 = 5

In addition to recoding, you can use it to test if cells are blank (=if(A2 = "", "Blank", "Not Blank")), compute different formulas based on different values (e.g., return the absolute value if a cell is negative: =if(A2 < 0, abs(A2), A2)), etc.

Nesting if Statements

if statements can be nested. For example, I use the following formula to compute letter grades from a percent grade given in cell A1. I can just drop this formula into, say, cell B1 and it will automatically give me the letter grade using CUNY’s conversion standards8:

=IF(A1>=97.5,"A+", IF(A1>=92.5,"A", IF(A1>=90,"A-", IF(A1>=87.5,"B+", IF(A1>=82.5,"B", IF(A1>=80,"B-", IF(A1>=77.5,"C+", IF(A1>=70,"C", IF(A1>=60,"D", IF(A1<60,"F"))))))))))

Note that there is an other Excel function, ifs that makes the syntax for nesting if statements a bit cleaner, but I personally prefer seeing it all spelled out—even if it means having a bunch of parentheses at the end.

The ifs statement for that same coding of percents into letter grades is:

=IFS(A1>=97.5,"A+", A1>=92.5,"A", A1>=90,"A-", A1>=87.5,"B+", A1>=82.5,"B", A1>=80,"B-", A1>=77.5,"C+", A1>=70,"C", A1>=60,"D", A1<60,"F")

5.5.3 vlookup

I love vlookup. It’s such a powerful way to recode variables based on some criterion. For example, I could recode all males to 0 and females to 1 in just a few steps.

vlookup is a vertical lookup that you use to fill in values down a column. You can use hlookup to look up values to a row (not a column) of data with values also looked up in rows of data. (Apparently xlookup is a new Excel command that knows which way the data are being read and looked up, so can be used instead of both vlookup and hlookup, but I’ve not gotten it to work.)

  • Formula:

=vlookup(cell to look up, range of cells to find replacement value, column in range for value to return, match mode)

  • Match modes:
Table 5.4: vlookup Formula Match Modes in Excel
Value to Enter Description
0 Exact match. If none found, returns #N/A. This is the default
-1 Exact match. If none found, return the next smaller item
1 Exact match. If none found, return the next larger item
2 A wildcard match where *, ?, and ~ are “wildcards”, MS’s pathetic attempt at regular expressions
  • To “lock” part of a cell reference, add $ right before it
  • E.g.:
    • To keep the reference for replacement values “locked” to cells A1 through B12 (written in the formula as A1:B12) when filling in cells below with that formula,
    • Add a $ before the 1 and the 12, making it A$1:B$12

Index and Match

You can get similar results by instead using the index and match commands. Since I prefer vlookup, for now, I will simply link to this ExcelJet page that covers them well.

5.6 Pivot Tables & Charts

Pivot tables are a great way to quickly generate descriptive statistics for categories. They are also flexible so that you can look at subgroups or “cross tables” that show, e.g., the stats for two variables that are crossed with each other (like looking at, say, the hip-waist ratios of genders crossed with ethnicities/races).

5.6.1 Inserting/Creating a Pivot Table or Chart

Table:

  • Insert tables using:
    • Alt, N, V
  • GUI:
    1. In the Tables group, click on the Insert tab
    2. Click on PivotTable
    3. Click OK

Chart:

  • Insert charts using:
    • Alt, N, S, C
  • GUI:
    1. In the Tables group, click on the Insert tab
    2. Click on PivotChart
    3. Click OK

5.7 Basic Statistics

  • Most of the statistical analyses are within the Data tab under the Data Analysis button

5.7.1 Correlations

Function Formula
Return a correlation matrix =correl(cell range 1,cell range 2)

Creating correlations (and many other descriptive & inferential stats) is also accessible via a GUI in the Analysis ToolPak described under ANOVAs, below.

5.7.2 t-Test

This formula returns the p-value for the t-test. It does not return the actual t-score.

Formula:

=t.test(group 1 column range,group 1 column range,tails,type)

  • Tails:
Table 5.5: t-Test Tails in Excel
Value to Enter Description
1 One-tailed testI.e., that Group 1 values are larger than Group 2 values
2 Two-tailed testI.e., that Group 1’s values are either larger or smaller than Group 2’s values
  • Type:
Table 5.6: t-Test Types in Excel
Value to Enter Description
1 Paired t-testI.e., that the values in a given row are from the same participant.E.g., Group 1 is a person’s pretest score and Group 2 is that same person’s posttest score
2 Unpaired t-test assuming homoscedasticityI.e., that the variance in the populations from which Group 1 is sample from is the same as the variance in the population from which Group 2 is sampled
3 Unpaired t-test assuming heteroscedasticityI.e., that the variances in the populations from which Groups 1 and 2 are sampled are different

N.b., OLS tests (e.g., t-tests & ANOVAs) are rather robust to heteroscedasticity, so choosing 2 for unpaired t-tests is generally fine. If variances are very different, then standardizing the variables usually suffices to address the issue.

Steps to Conduct a t-Test on the China Posttest Data

As an example for computing the p-value for a t-test in Excel, let us use the posttest data from the study of elementary students in China.

  1. Open the CFL_Posttest_Data.csv file in Excel
  2. Sort all of the data by Population. To do this:
    1. Type Control + Home to go to the top-left cell
    2. Shift + Control + End to select all of the data
    3. Type Alt (Option on a Mac), then D, and then S to open the Sort dialogue box (it’s not easy to see how to get the combination, but it’s one I use often enough to have just memorized, If it helps to know, that’s D for the Data menu and then S for Sort.
    4. Sort by Population in A to Z order
  3. Now go to somewhere outside of the data to create the formula. Me, I created a new sheet and went to that instead of doing things on the sheet with the data—don’t want to accidentally overwrite data and not realize it. Believe me.
  4. In essence, the t.test formula has you type =t.test(9 and then to follow that with a selection of the first set of data and then follow that with a selection of the second set of data. Plus some other, almost-random stuff at the end of the command.
    1. You can do all of this by typing =t.test( into a cell and then going back to the sheet with all of the data left-mouse-clicking in the first cell with toca.pro data (cell M2), and then holding the mouse button down while you scroll down until you see in the Population column that it’s changed from Migrant to Non-Migrant to select the first group of data. Then repeating that for selecting the non-Migrant data in below that in the same column.
    2. Or you can paste this into a cell: =T.TEST(CFL_Posttest_Data!M2:M849,CFL_Posttest_Data!M850:M1130,2,2) What that does is say:
      1. The first set of data are in the CFL_Posttest_Data sheet in cells M2 through M849; these are the Migrant students’ raw prosociality scores
      2. The second set of data are also in the CFL_Posttest_Data sheet, these in cells M850 down through cell M1130; these are the Non-Migrant students’ raw prosociality scores
      3. That first 2 after indicating the second set of data tells Excel that it’s a two-tailed test (meaning we’re testing whether the Migrant scores are higher or lower than the non-Migrant scores; a one-tailed test would test only if they were higher but not lower; one-tailed tests are more powerful but only answer that one question—as is common in stats, there’s a trade off between power and precision)
      4. That second 2 tells Excel it’s an unpaired t-test (meaning the people in one group are not the same people as in the second group; they could be the same people, e.g., if we were following the same people in a pre-post design). It also tells Excel to assume that the variance of prosociality scores is the same in both groups; again, this is an assumption that’s (a) usually wrong to some degree and (b) O.K. to make as long as they’re not that different.
    3. After doing all this mountain of work, you should be gifted with a rather anticlimactic result of a single number appearing in that cell. That modest number should be 0.196848 (depending on how many digits you see). That is the p-value for the t-test; if this number were less than, say, .05, we would say that there is a significant difference between the two groups. However, it is not, so we would not say there is a significant difference.

5.7.3 ANOVAs

Installing the Anlaysis Add-in

The functionality to conduct most of the stats that Excel can are not loaded by default. (These are, however, available by default in LO Calc). Fortunately, they are (at least so far) available through an easy installation. To do this:

  1. Click on File
  2. In the menu that opens, click on Options then Add-ins
  3. Select the Analysis ToolPak option near the top
  4. At bottom of dialogue box that opens, under Manage, select Exel Add-ins and click Go
  5. Select Analysis ToolPak (and whichever other ones you want)
    • (VBA is for using MS’s visual basic functionality—a holdover from when Windows & Office were still figuring themselves out)
  6. An Anlaysis tab will now appear in under the Data tab

Conducting an ANOVA

E.g., a “Single Factor” ANOVA, which is otherwise known (confusingly) as a one-way ANOVA. This is simply an ANOVA that has only one predictor (or independent variable, IV) and one outcomes variable (or dependent variable, DV).

  1. Under the Data tab and in the Analysis group, click on the Data Analysis button
  2. In the dialogue box that opens, click on Anova: Single factor
  3. Click on the icon next to the Input Range field to minimize the next dialogue box that opens
  4. Select the range of cells (here, two columns, one for the IV & one for the DV), and then click on the icon to maximize that dialogue box
  5. Choose whether data are grouped by columns of rows—most likley by columns, with one column for each variable
  6. Choose where the results will be posted; I tend to choose New Workseet Ply: which I give an apposite name to, e.g., “ANOVA Source Table”
  7. Click OK


  1. I miss Quattro Pro nearly as much as Word Perfect. Then again, I also miss making mixed tapes on my boom box and decorating the cassette cases, so take it for what it is.↩︎

  2. This is usually the Command key on a Mac.↩︎

  3. Alt usually translates to Option on Macs↩︎

  4. The .s at the end of it denotes that it is to generate the standard deviation of a sample—not the population. (That’s =stdev.p although you’ll likely never use it: We rarely have all of the data for a population. That’s for qualitative research to say that their participants are that entire population.)↩︎

  5. Remember that computers can’t do random things. (So. there’s no need for a Voight-Kampff test; all you need is to detect non-randomness.) So, the values generated by Excel cannot be considered random for, e.g., randomizing within a study. True randomization can be achieved best through old-fashioned ways: throwing dice, pulling pieces of paper from a bag, flipping coins, etc.↩︎

  6. This is “dummy coding” gender into a yes/no variable about whether the person identifies as female. If the person identified as female, we code that as a 1; any other response (except missing data) we code as a 0; missing data are coded as missing data. We could also have an other variable that dummy-codes whether they identify as male (there, 1 for male and 0 for anything else.). And yes, this allows us to have times when someone identifies as both female and male by letting that person have a 1 for both variables. One of the advantages of dummy coding then is that it allows for multiple responses/categories.↩︎

  7. Unless we instead wanted Excel to treat the output not as a number but as a word, i.e., treating some number as a word and not as a number.↩︎

  8. Notice that the IF statements are read by Excel from left to right, so, if a percent is not ≥ 97.5 then Excel goes to the next IF statement to see if it’s ≥ 92.5, and if not then goes to the next IF, etc.↩︎

  9. Typing the left parenthesis let you then fill in the values with mouse selections and/or further typing↩︎