6  Introduction to SPSS & Data Preparation

6.1 Overview

This document is intended to introduce you to a couple of things. First, we will review SPSS’s GUI1 and how data are prepared and handled therein. We will import a set of rather clean data that we will use to demonstrate ways to further prepare and manipulate data.

Second, we will then introduce some common data exploration functions to understand these data better. We will use this opportunity to further consider some of the concepts we’re covering through our other class activities, including normality and outliers.

6.2 Orientation to SPSS

6.2.1 Accessing SPSS

SPSS can be accessed online with your CUNY ID through Apporto—as long as “your browser” is Chrome.

To access SPSS through Apporto:

  1. Go to CUNY’s Apporto login page: https://cuny.apporto.com/
  2. Enter your CUNY login credentials (your @login.cuny.edu “email” address)
  3. If you don’t already see an icon for SPSS, in the Apporto home page, click on the App Store button in the top, left corner, just below the hamburger icon that opens up that left-hand menu.
  4. Click to Launch SPSS and follow any steps to “optimize2” and reconnect.

To open data in Apporto, there are two ways:

1, Uploading files via dialogue

  1. Locate the menu bar immediately above the Apporto window:
  2. Click on the File upload button ()
  3. Follow the dialogue therein

2, Dragging files into the Apporto window

  1. Open up a file manager outside of the Apporto environment (i.e., in a normal window outside the browser in which Apporto is running)
  2. Left click to grab and drag a data file from your file manager into the Apporto window. Apporto will open a notification window letting you know that the file has indeed been imported; it should also now appear in the Apporto window
  3. You can then drag the file from Apporto window into the SPSS window that is itself inside Apporto3

Files you save in Apporto will (at least eventually) appear in either the This PC > Desktop folder (accessible from the Desktop folder under Quick Access in Windows’ native file manager) or in the This PC > Documents folder (Documents under Quick Access).

Clicking on the Settings gear to the right of the Apporto menu bar gives the option to access USBs, although this proved to not always be reliable for all OSs for me.

To export files from Apporto:

  1. In that menu bar immediately above the Apporto window:
  2. Click on the File download button ()
  3. You get the idea

Alternatively, you can open your email from within Apporto and send it to yourself as an attachment.

6.2.2 Editing Global Options

Before we dive into the windows and workings of SPSS, I’d like to note that there are a few useful options to consider modifying given your needs. There are, in fact, many options for tailor SPSS’s functioning, output, and performance given throughout its dialogues and within its rather large list of syntax commands. Here, however, we will simply note a few “global” options that can be set to adjust how SPSS acts in general.

To access these, select Edit > Options from the menu (in any window). When that dialogue opens, you will see many choices, including the Variable Lists section in the top right oft the General tab. In that section, you can choose to either have SPSS default to Display names or to Display labels of variables. As discussed further below, a given variable can be identified by either the shorter, more-restricted name or by the longer label used to describe it. By choosing one of these option you can either show smaller, less-intuitive names or longer, more explanatory labels in (nearly) all of the output SPSS generates. Of course, you can also switch between these as needed.

Some of the other options under the General tab are worth considering (such as whether you want to have SPSS display No scientific notation for small numbers in tables; I mean, we’re doing research here, not science). The Language, Viewer, Data, Currency, Charts, Scipts, and Syntax Editor tabs are less useful for most users, but the items in the Output tab’s Outline Labeling section may also be worth considering. Either of those options can let you choose whether to show only the variable names, labels, or both; I suggest using Labels for output you share with others, but you may want to use Names for your own analyses since it will make for simpler output.

Under the Pivot Tables tab, you may want to consider changing the TagbleLook to APA_TimesRoma_12pt when you’re ready to produce pivot tables for your dissertation or publishable manuscripts.

File Locations can be nice to change if you store your data and analyses in dedicated folders.

Finally, you may (or may not) wish to change settings in the Privacy tab.

There is more one can do to customize SPSS output and set defaults that allow for automatic APA styling. Including:

6.2.3 SPSS Windows

SPSS is inherently a syntax-driven program, but its popularity is arguably due in large part to its useful GUI. The GUI has three main windows:

  1. The Data Editor which is comprised of the Data View and Variable View tabs
  2. The Output window
  3. The Syntax Editor window

The Data Editor Window

The Data Editor window is the one most commonly used to interface with SPSS. I think one reason for this is that is can help to be looking at one’s data while working with it—if nothing else to remember what variables there are and what their names are.

Another reason, though, is because you will have one Data Editor window for each data set you have open; when you access the drop-down menu at the top to, e.g., Analyze your data, SPSS will assume you want to work with the data in whatever window is either currently raised or that was last raised. So, if you have more than one data set open, simply cycle through to the one you want to work with and then choose what you want to be from the drop-down menu—from either the Data Editor, Output, or even Syntax window.

Relatedly, you will notice that the drop-down menu at the top is the same4 for all of the windows. This indeed means that you don’t have to cycle back to the Data Edtior window before you do anything. In fact, it can be sometimes easier top use the menu from the Output window so you can look at the results of one command to know what to do with the next. (Anyway, you can see the whole list of variables accessible to a given command in that command’s dialogue boxes.)

The Data View Tab

The Data View tab5 presents a spreadsheet of the data. Just like other spreadsheet programs, you can enter, edit, and scroll through your data here. You can use the Page up and Page Down or the arrow keys to scroll. Holding down the Control/Command button while tapping arrow keys will go to the ends of the data; e.g., Control/Command + \(\Downarrow\) will go to the bottom of the data set; Control/Command + \(\Rightarrow\) will go to far right of it, etc. One way this works differently from, e.g., Excel though is that SPSS will skip over empty cells whereas Excel will stop right before each empty cell instead of going all the way to the end.

Right-clicking on things in the Data View tab lets you do some useful things.

  • Right-clicking on a column header (i.e., the part at the top that list the variable name) lets you:
    • Sort the entire data set by that variable
    • Copy the variable name or label (more about those things under Variable View)
    • Clear the data set of that variable. This is the command to delete something in SPSS. Right-clicking and then choosing Clear will delete the selected cell, row, or column in either the Data View or Variable view tab.
    • Get Variable Information including the variable’s name, label, type6, any codes for missing values, and the measurement scale for that or any other variable.
    • Send a command to give a nice set of descriptive statistics to the Output window (and go there automatically to see those results)
  • Right-clicking on a row number lets you:
    • Cut or Copy that row
    • Clear (i.e., delete) that row
    • Insert Cases to manually enter a new row of data (or paste one that you said to cut or copy)
  • Right-clicking on a cell lets you:
    • Cut or Copy the values in that cell
    • Paste values selected from cutting or copying
      • You can also Paste with Variable Names, useful (or confusing) for pasting into a different column
    • Copy the variable name or label
    • Access Variable Information or Descriptive Statistics for that entire variable
    • Clear (i.e., delete) the information in that cell
    • Check the spelling against SPSS’s dictionary
    • Change the font slightly
The Variable View Tab

The Variable View presents what is essentially, a codebook, a list of the variables and information about them, including:

  • Name,
    • the name that SPSS uses to access that variable. These are best kept short so that you can see the whole thing in some of SPSS’s unnecessarily-small dialogues. They also can only contain letters, numbers, periods, and underscores.
  • Type
    • indicates whether the variable is a String (alphanumeric), Numeric (numbers not specially formatted), or a number with various types of special formatting, such as dates, currency, etc. The Comma and Dot types are for numbers with thousands etc. indicated by commas or dots, respectively7. Scientific notation is for numbers formatted like 1 \(\times\) 103 to denote 1,000. Clicking on the button with an ellipsis opens a dialogue where you can change the number type (as well as change the length of the variable—how many characters long it can be).
  • Width,
    • which simply indicates how many characters long or how many digits a variable has left of a decimal. No big deal
  • Decimals
    • presents how many decimal places a (numeric) has been assigned.
  • Label
    • is very useful. In this field you can write a rather long description of what a given variable measures. You can use nearly any characters here to explain it well. To create or change a label, simply left-click inside that field and start typing.
  • Values
    • is also quite useful; for variables that are encoded with numbers, you can use this field to indicate what each level of the variable actually denotes. For example, if you have a Likert-style response encoded a number from 1 to 5, you can click on the ellipsis button to denote that 1 = Strongly Disagree, etc. When you explore the variable with descriptives, etc. SPSS will use these value labels instead, making output considerably easier to read. We will show an example of doing this below.
  • Missing
    • is yet another useful field. Sometimes a certain character or value will be used to denote a missing value. For example, 99 or NA may be used a place-holders to signify that that datum is actually missing. By clicking on the ellipsis button, you can denote this. We do this below.
  • Columns
    • simply notes how many characters wide a column is. You can change the value here or, under the Data View tab, left-click the space between two rows to change this.
  • Align
    • just indicates the left, right, or center alignment of a column.
  • Measure
    • is an unexpectedly important attribute of a variable. SPSS is quite finicky about the “measure” type of a variable: You can only perform actions on a variable that match that variable type. For example, you can only run correlations on continuous variables. The measurement types that SPSS allows are:
    • Scale denotes a “scalar” variable, which corresponds to either of Steven’s “interval” or “ratio” levels. It is indicated by a little ruler ().
    • Ordinal denotes a, well, ordinal variable and is indicated by a little histogram ().
    • Nominal denotes a nominal variable is is indicated by a cute little Venn diagram ().
  • Role
    • is a rather under-utilized field. It can be used to indicate whether a variable is a predictor / independent variable (Input), a outcome / dependent variable (Target), Both, or whether it is used to Partition or Split the data set. We will create a variable that indeed partitions when we subset the data to only include migrant students.

The Output Window

Another reason I think SPSS is so widely used is because, with just a few mouse clicks, it delivers copious amounts of output. As I noted in class, personally I’ve found that some researchers use this output to determine their analyses, assuming that if some stat program spits it out, it must be good. Nonetheless, it can be good—and certainly makes it worth annotating the output.

Annotating Output

The Output window is comprised of two sections, an outline and a main window. The information in either can be changed or added to manually. This can be a good idea. First, of course, because SPSS does return a lot of results and sifting through even a few sets of analyses can be tedious.

Second, I strongly recommend taking notes on what you are doing in your analyses and what your thoughts on them are. With data and analyses of any real size and complexity, it can be difficult to jump back in to your analyses even a week or so later; steps that seemed obvious and important at the time can quickly become obscure and lost.

Ways of annotating your output:

  • Insert a heading in the outline by clicking Insert > New Heading. This will create a new heading at the cursor; double-click on this heading to type in a phrase that will remind you of what you are doing in that section of the output.
    • Alternatively, you can simply double-click on an existing heading to change it. For example, if you conduct more than one t-test output, you can double click on the first to change it to t-test of toca.pro by group and the second to t-test of toca.dis by group. You can left-click and drag the spacer between the windows to make the outline section wider, but you’ll still not want to make the headings too long since they’ll quickly become longer than a useful outline window.
  • Insert notes into the output itself by clicking Insert > New Text. This will create a text box in the output section into which you can write pretty much whatever you want. Unlike a heading, this can be as long as you want to give yourself and your colleagues as much information about what you are doing and what it means.
  • You can use the Insert menu to insert other things, too, including whole titles for the output, images, etc.

Note that you can also double-click on any element in the main output section to manipulate that element. This way, you can modify the colors, fonts, or even the text within tables, figures, etc.

Of course, you can then save your output (to a .spv file) as notes on your analyses.

Exporting Output

Right-clicking on an element lets you copy it to then paste it into, e.g., your manuscript (as we will do in Chapter 9: Writing Results).

Alternatively, you can Export an element. When you right-click on an element and choose to do that, you will be able to export it as a .html, .pdf, .ppt, .doc, etc. For importing into, e.g., Word, I suggest exporting as an .html file.

Syntax in Output

SPSS is a powerful stats program, but I personally think that its GUI is a big reason for its success. Nonetheless, SPSS’s GUI is in fact just an “overlay” that just lets us access its most common commands more intuitively; SPSS is in fact running the syntax that those mouse clicks created.

SPSS versions 27 and earlier return the syntax it used to generate results in the Output window by default right above the given results8. As of version 28, it does not. We can set SPSS to automatically return the syntax used in the output by going to Edit > Options > Viewer and then checking the Display commands in the log box in the lower-left of that Viewer window9.

Why do this? Because there are several ways in which the syntax that SPSS posts can be quite useful. First, you can copy that syntax into the Syntax Editor (as noted below) to rerun any analyses. This is useful when you are returning to analyses later on and, e.g., want to generate a smaller set of analyses.

Second, as you learn what SPSS can do, you can use the syntax to learn better how to do it—and how to tweak your analyses to get exactly the output you want. Reviewing existing syntax is a lot easier than learning it from scratch.

Third, once you’ve gained some facility using SPSS, you will find that there are things you want to do that you can’t through the GUI. Instead, you will need to do things directly withe the syntax. Although you certainly can type syntax directly into the Syntax Editor, it’s often easier to paste in existing syntax and edit it as needed. In fact, in the long run, that’s also faster.

Fourth, you can annotate syntax a bit like you can annotate output. This way, you can create and save a syntax file (saved as a .sps file) that’s a lot smaller and easier to navigate through than some massive output file—and still be able to generate that mountain of results with a few quick keystrokes10.

The Sytnax Window

SPSS doesn’t open a Syntax window automatically, like it does a Date Editor or Output window, but simply clicking File > New > Syntax opens one. We will demonstrate using it below, but the general way to use it is to either paste in or type some syntax command and, with the cursor in some part of that syntax, either click on the big, green play button11 or type Control/Command + R.

SPSS syntax itself follows a set grammar. Some command is given first; often this is immediately followed by a “statement” that just tells SPSS what variables, etc. to run that command on. This is followed by one or more options, for example whether to print out both figures and tables based on the command. Critically, each command must end with a period.

As you might expect, SPSS has many commands to choose from; more are available if you pay them more (and have your own copy of SPSS; this won’t work with the version we have access to through CUNY).

6.3 Data Preparation & Cleaning

In this section, we will now build on this to further clean and prepare the data for analyses. Most of these tasks are here to demonstrate how to clean or improve data in ways that are commonly needed.

6.3.1 Change id to nominal

The id variable is right now a Scale variable. That’s natural since it is a number after all. And it’s not uncommon for SPSS to import IDs as numbers since replacing names with numbers is a very typical way to anonymize participants. Frankly, leaving it as a number (a Scale level Measure) won’t likely create any problems in SPSS12, but it still presents a good opportunity to demonstrate changing the Measure of a variable. To do this:

  1. Go to the Variable View tab of the Data Editor window.
  2. Left-click on the Measure cell in the id variable’s row. When you do, a drop-down menu will appear listing the three measure levels.
  3. Select to make id a Nominal variable.

Now, SPSS will “understand” that this is in fact a name that signifies each participants and should be treated as such in all analyses.

6.3.2 Setting Values labels for pollution

This next task is also, frankly, not necessary here. But it is a very useful one to know since—like giving good variable labels—it can really help make things clearer in output. We will give labels to each of the response values for the CHEAKS Pollution subscale. For a bit of background, all of the CHEAKS subscale scores—including that for pollution—are created from responses to two items. If a respondent indicates in a given item that they have done something to help, e.g., reduce pollution, then that item response is coded as a 1. If they do not report having done that, e.g., pollution-related tasks, then their response to that item is coded as a 0. Since there are two items for each subscale, the scores on a subscale can range from 0 (i.e., didn’t do either task), through 1 (they did one of the two tasks), to 2 (they did both tasks). Let’s code the pollution score to reflect this:

  1. Also in the Variable View of the Data Editor, click on the Values cell in the the pollution row.
  2. Click on the ellipsis button that appears.
  3. In the dialogue box that opens, enter a 0 in the Values field.
  4. Type Engaged in neither task in the Label field.
  5. Click the Add button. 0 = "Engaged in neither task" will now appear in the field next to the Add button.
  6. Now, type a 1 in the Values field.
  7. Type Engaged in one task in the Label field.
  8. Again click the Add button to add this association as well.
  9. Do this once more to add 2 = "Engaged in both tasks" to the list.
  10. Click OK

Now when you click on the values cell for the pollution row, you will see these value labels added. Right-clicking on the pollution row and choosing to look at the Variable Information will show these in addition to the other information there.

Note that we have not actually changed the data. They are still numbers (Scale level measures). Right-click again on that variable (in either the Data View or Variable View tabs) and select Descriptive Statistics. You will see in the output that SPSS generates means, etc. just as it would for any interval/ratio variable.

However, now in the drop-down menu click on Analyze > Descriptive Statistics > Frequencies and you will see that the level values are replaced with the more explanatory value labels, helping us (and out colleagues and readers) more easily see what the responses really meant.

6.3.3 Setting missing values for wave

As mentioned briefly above, we can set certain values to be recognized as representing missing values. Most of the variables in this set were imported with blank cells denoting missing values, but wave has a few values that are listed as NA. Right now, SPSS doesn’t recognize these as missing; to show this, simply right-click on that variable in the Data Editor and select Descriptive Statistics13, which shows that there are 1129 Valid cases and 0 Missing:

while is also treating NA as a factor level:

We can easily fix this, though:

  1. In the Variable View tab of the Data Editor window, click on the the ellipsis button in the Missing cell of the wave row
  2. Click on the radio button next to Discrete missing values
  3. In the first field under that, type in NA
  4. Click OK

Had we numerical ranges we wanted to indicate denoted missing values, we would have instead entered them into the Range plus optional discrete missing value fields.

Now when we look at the descriptives for wave, we see that SPSS indeed sees NA as a missing value:

6.3.4 Transform city into a numeric variable called city.n

There are pretty often times when it’s useful to create a new variable based on the values of one or more existing variables. One occasion to do this is to make a nominal/ordinal variable into a numeric one14. Let’s create a new variable based on the city values15

To do this:

  1. In any window, click on Transform > Recode into Different Variables16
  2. In the dialogue box that opens, click on city in the list of variables in the left-hand field.
  3. Click on the arrow button () next to that list of variables. This adds city to the Input Variable -> Output Variable field.
  4. In the Output Variable section, enter city.n into the Name field and Numeric version of City (Or something like that) into the Label field.
  5. Click on the Old and New Values button
  6. A new dialogue that opens; in the Old Value section, type in Kunshan in the Value field
  7. In the New Value field, type in 0
  8. Click the Add button to add that to the Old -> New field
  9. Repeat this for transforming Shanghai to 1. Note that since this is the only remaining option, we could instead select All othe values at the bottom of the Old Value section while still adding 1 to the New Value field.
  10. Click Continue ro return to the previous dialogue and then click OK

6.3.5 Create a dummy variable for Population called population.n

I am a pretty strong advocate for using dummy variables. They can make it easier to interpret the effects of each level of a nominal variable without needing to resort to, e.g., post hoc analyses.

  1. Also under the Transform menu, select Create Dummy Variables
  2. Select Population under Variables and then add that to the Create Cummy Variables for: field by again clicking on the arrow ()
  3. We are going to create a simple dummy variable—not, e.g., one derived from a combination of other variables—so leave Create main-effect dummies selected
  4. We didn’t create any value labels, but it’s fine to leave selected Use value labels under Dummy Variable Labels since neither choice matters for a simple “main effect” dummies.
  5. Under Macros, select to Omit first dummy category from maro definitions. We can nearly always select to do this because we usually need one fewer dummy variables than there are values in the original variable. The Population variables has two values (Migrant and Non-Migrant), so we only need one dummy variable (i.e., 2 - 1 = 1) to fully encode the information in the Population variable17
  6. In the Root Names field, type population
  7. Click OK

Dummy variables can only take on the values of 0 or 1. For some reason, SPSS gives dummies it creates two decimal places. We clearly don’t need these, so:

  1. In the Variable View tab, click into the Decimals cell of the population_1 variable18
  2. Change the value to 0

Note that we could also change to Width to 1 since we only need one digit to the left of the decimal.

Frequencies & Crosstabs

So far, we’ve been mainly cleaning and prepping the data. I want to break this pattern to preface the step after this one in which we subset our data.

A common way to examine nominal (and ordinal) variables is through frequencies and cross tabulations (“crosstabs”) of those frequencies across pairs of variables. Let us do this with some of our Population and Group19:

  1. Click on Anlayze > Descriptive Statistics > Frequencies
  2. In the dialogue that opens, select Population and Group. You can do this by clicking on them separately or, e.g., holding down the Control/Command key to select them both
  3. Click on the arrow button to move them to the Variable(s) field
  4. SPSS provides many options for displaying variables. Some of these are listed under the Statistics button, but since these are both nominal variables, we can only meaning select Median under Central Tencency (and perhaps Quantiles)
  5. Under Charts we can select to add, e.g,. Bar charts for Frequencies. Note that pie charts are often misleading and rarely the best option for displaying data.
  6. For now, there’s not much of interest under Format, Style, or Bootstrap
  7. But do notice on the main dialogue box that we can select whether we want to Create APA style tables, a nice option for preparing pieces for dissertations and manuscripts
  8. Click OK

We can see from both the tables and bar charts that we have many more migrant than non-migrant students and many more students who participated in the Caring for Life program (CFL) than those who didn’t (No-CFL). Given that these were chosen variables—part of the study’s manipulation—this is curious. Let’s investigate further

  1. Click on Anlayze > Descriptive Statistics > Crosstabs
  2. From the variable list, select Population and use the arrow to move it to the Row(s) field
  3. And move Group to the Columns(s) field
  4. SPSS again offers many choices for options. For example, under Statistics we can (but won’t) choose Chi-square to test whether the distribution of counts differs from expected values and, under Cells, whether to present not only the Observed cells counts but also the Expected. Now, though, simply click OK

The table this produces:

reveals why the counts for both Population and Group were so uneven: There are no non-migrant students who participated in the CFL program20

6.3.6 Subset the data to only include migrant students

Given that there are no non-migrant students in the no-CFL group, it produces bias to include them as part of the CFL group. This is thus one of the few times when it is justified to subset one’s data: Although subsetting literally removes information from your analyses21, the information that they would include would make CFL – no-CFL comparisons like comparing apples to oranges.

Let us thus subset our data to include only migrant students.

Under the Data menu option, there are a few options for subsetting data:

  • Split File
    • This keeps the data together, but produces separate output for the groups they are split into.
    • For example, we would split these data by population and then easily compare descriptives between the migrant and non-migrant.
    • For those who believe in “segregating” their groups, this is a good option22
  • Split into Files
    • In which we can create two or more sets of data separated by values on one or more variables.
    • For example, we could separate these data into files that have first- and second-grade students that are either migrant or non-migrant students. Many do this.
  • Select Cases
    • This selects cases based on some criterion. The unselected cases can then be “hidden” from analyses, deleted, or moved to an other file.

To do this:

  1. Click on Data > Select Cases
  2. In the dialogue box that opens, select Population
  3. Make sure If condition is satisfied is selected23
  4. Click on the If... button24
  5. The the new dialogue, again select Population and move it to the field at the top right.
  6. Either type or click on the = button
  7. Type into that field "Migrant". Do include the quotes so that that field now shows: Population = "Migrant"
  8. Click Continue
  9. Under Output, make sure Filter out unselected cases is checked. This retains the unselected data, but removes it from any analyses. Copy seleted cases to a new dataset is tantamount to Split into Files
  10. Click OK

In the Data View of the Data Edtiro window, you can no see that the rows for non-migrant students have a line crossed through their row number:

Under the Variable View tab, you’ll also see that a new variable called filter_$ has been created. The label for this variable, at least, is informative: Population = "Migrant" (FILTER)

And now all subsequent analyses will disclude the non-migrants students25. We can stop filtering out the non-migrant students simply by deleting (Clearing) that filter variable. Being able to now subset our data is why I wanted to do this task a bit out of step with the rest of this guide.

6.3.7 Compute CHEAKS Total from Other Variables

This data set doesn’t contain the responses to individual items (or other things), but it’s obviously much more common for data to have things like that—and for us to need to compute scores based on these individual items. Intentionally, though, we can still do that here since the CHEAKS total score is itself comprised of the scores on the various CHEAKS subscales.

  1. Click on Transform > Compute Variable
  2. In the dialogue that opens, type in cheaks.total in the Target Variable field
  3. Just to do it, click on the Type &* Label button
  4. To make the label CHEAKS Total Score (although selecting Use expression as label isn’t a bad idea to make a clear history of what you’ve done)
  5. And leave the Type as Numeric
  6. In the list of variables, scroll down to pollution and click on that
  7. Click on pollution from the list of variables, and then the blue arrow to move it to the Numeric Expression field
  8. In that numeric expression field, type + and then move general over before following that with another +.
  9. Continue to do this until the Numeric Expression field shows: pollution + general + water + energy + animals + recycling26
  10. Click on OK

Note that if you are regularly importing data into SPSS (e.g., if you are “peeking” at data as they come in27), then this would be a great occasion to save the syntax created to rerun upon each data importation.

6.3.8 Standardize Variables

I’m also a big fan of standardized data. It doesn’t change the distribution of scores at all but makes values on one variable directly comparable to values on an other—even if they’re measured on very different scales28

SPSS makes it very easy to standardize variables:

  1. Click on Anlayze > Descriptive Statistics > Descriptives
  2. Click on toca.pro
  3. Now, holding down the Shift key either single-(left-)click on recyclilng or tap the down-arrow key until you have selected all of the variables from toca.pro to recycing
  4. Now click on the blue arrow to move all of those variables to the varialbe(s) field
  5. Under the Options dialogue, we might as well check to review, e.g., the Mean, Std. Deviation, Minimum, Maximum, and S. E. Mean (i.e., the standard error of the mean that we covered in our first lecture)
  6. But our real goal here is to check the Save standardized values as variables before clicking on OK

And that’s all it takes to create standardized variables.

6.4 Additiolnal Resources


  1. “Graphical user interface”↩︎

  2. Because following all of the steps they already laid out for you could not be optimal.↩︎

  3. You can load it from the Apporto file system via, e.g., This PC > Desktop, but files don’t immediately appear there (needing connection refreshes?), so simply dragging it into the SPSS Data Editor window seems most reliable to me.↩︎

  4. Well, actually the Syntax window has a few extra menu items related to running syntax and accessing additional extensions.↩︎

  5. The tabs are at the bottom left of the window.↩︎

  6. This “type” is given as either the letter (A or F) or word (DATE, TIME, PCT (for percent), DOLLAR, etc.) followed by a number. An A means that it is a string variable (i.e., Alphanumeric), and an F means it’s a number (an “F” is used for esoteric reasons). The number presents the number of digits possible before and after the decimal point; if the value has no decimal (e.g., F4), then that variable has no decimals.↩︎

  7. I.e., Comma is for numbers formatted like 1,000.00 and Dot is for numbers formatted like 1.000,00↩︎

  8. The syntax is posted under Log headings in the outline. this us useful for finding it, but the log is also used by SPSS to report errors and warnings, so it can be a little confusing to find to the syntax or even know that errors/warnings were generated.↩︎

  9. We can also turn on outputting syntax with syntax: SET PRINTBACK LISTING. turns it on, and SET PRINTBACK NONE. turns it off.↩︎

  10. Control/Command + A to select all of the syntax in the window, and then Control/Command + R to run it all.↩︎

  11. I.e., this button: ↩︎

  12. As we’ll discuss briefly in the measurement class, interval and ratio variables—those that SPSS calls Scale variables—can be analyses in more ways than ordinal variables; ordinal, in turn, can be analyzed in more ways than nominal.↩︎

  13. Or go to Analyze > Descriptive Statistics > Descriptives in the drop-down menu.↩︎

  14. Of course, we can simply add value labels to existing (numeric) values in a Scale variable, but here we’re not just adding labels but instead creating a whole new variable with different values.↩︎

  15. I’m doing this as a way of anonymizing city, but since there are only two cities in these data, I’m really creating a dummy variable. There is a better way to create dummy variables that we’ll cover below, but I just wanted to point this out now.↩︎

  16. As you can see, we could also recode into the same variable and indeed anonymize city in one action. Personally, I tend to create new things instead of overwriting existing in case I make a mistake and can’t easily recover what I overwrote.↩︎

  17. Note that SPSS may create two variables anyway. I’m not sure why it does this, but we can simply delete (Clear) the one with the Population=Non-Migrant label since we’ll only work with the migrant students.↩︎

  18. Or whichever is the dummy with the Population=Migrant label that we’ll be keeping.↩︎

  19. We could also do this with these variable’s numeric equivalents—especially if we had given value labels to their numbers.↩︎

  20. This was a consequence of working in the field—i.e., with data collected outside controlled, laboratory-like settings. Schools who participated in the control (i.e., no-CFL) group were essentially asked to give their time and resources to the study without getting anything in return since (that year), they were not getting the CFL program offered at their school. It can be hard to find busy schools with many demands that can afford to do this. To compensate for this, I was forced to use a non-migrant control group from a previous study. Hardly ideal, but necessary.↩︎

  21. Which turns out to be a bad idea.↩︎

  22. As you may guess, I prefer not doing this. Better is to create a factor in one’s analyses that measures the effect of the variable (e.g., Migrant) on outcomes than in essentially doubling the number of analyses conducted and thus nearly doubling the chances of errors.↩︎

  23. Random sampe of cases woudl be used, e.g., to split the data into a “training” and “test” set of data to create and test, e.g., the generalizability of a model. Splitting Based on a time or case range would be done if we feel that, e.g., participants differ qualitatively over time or within ranges of cases. Use filter variable is essentially what we’re doing, but we’re first needing to create that filer variable.↩︎

  24. Always a good thing to do in life.↩︎

  25. Don’t worry, they’re relatively privileged otherwise in life compared to the migrant students.↩︎

  26. Yeah, or just copying that line into the Numeric Expression field to save some typing. Note too that we could also have chosen either All or Statistical from the Function group field and then Sum from the Functions and Special Variables field and made the sum of those variables.↩︎

  27. Which, of course, you should never do, and that no one ever does.↩︎

  28. In analytic models—if all variables are either standardized or dummy-coded—it also lets us remove the intercept term, making our analyses a bit more powerful … but more on that in Stat 2.↩︎