Foundations of Modern Research Measurement

Overview

  • Theories and Domain Sampling
  • Types of Measurement
  • A Brief History of Measurement
  • General Concepts of Good Measurement
  • Likert Response Formats

Theories and Domain Sampling

Theories and Domain Sampling:
Overview

  • The Roles of Theories
  • Sampling a Domain
    • Reliability
  • Ostensible / Non-Ostensible Domains
  • Variability
  • Precision & Range

The Roles of Theories

  • A constant touchstone for validity
  • Often couched as models
    • Comprised of constructs
    • And the relationships between them
  • Constructs often represent a conceptual “space”—or domain—that one tries to measure
    • Well-defined constructs are typically unitary
      • Varying in only one way
      • And representable as a dimension

Sampling a Domain

  • Measurement can be seen as attempts to determine where one is in that theoretical space
  • With repeated measurements benefiting from how consistently & accurately each point in that domain is quantified
    • Whence reliability

Ostensible / Non-Ostensible Domains

  • Ostensible Domains
    • Directly measuring a phenomenon
    • E.g., heart rate, height, extent of a rash, severity of a burn
  • Non-Ostensible Domains

Variability

  • Research: Attempting to detect & interpret differences
    • E.g., where one is in that domain
      • More parsimoniously, where on a dimension
    • And why they are there
  • We gain clarity in doing this by increasing interpretable variability
    • E.g., knowing where & why people at are different places on it

Precision & Range

Target, Non-Ostensible Dimension

Ostensible Instrument

Precision & Range (cont.)

Target, Non-Ostensible Dimension

Ostensible Instrument

Precision & Range (end)

Target, Non-Ostensible Dimension

Ostensible Instrument

Types of Measurement

Types of Measurement:
Overview

  • Process vs. outcome measures
  • Norm- vs. criterion-referenced
  • Some other types of measurements

Process vs. Outcome Measures

  • Process
    • Measure actions, steps, or methods taken
    • E.g., number of patients receiving vaccinations, adherence to treatment protocols, frequency of counseling sessions
  • Outcome
    • Measure results or effects
    • E.g., reduction in infection rates, improvement in patient quality of life, percent of patients recovering
    • Also balance measures which compare between processes or outcomes

Process vs. Outcome Measures (cont.)

Aspect Process Measures Outcome Measures
Focus Activities, methods, or procedures Results, impacts, or end goals
Timeframe Measured during the process Measured after the process
Purpose Ensure fidelity and quality of implementation Assess effectiveness or success
Examples Vaccination rates, attendance at training Decreased disease rates, improved survival rates
Directly Influenced By Inputs and actions Both process success and external factors
Utility Helps identify bottlenecks and improve methods Validates the overall value of the intervention

Norm- vs. Criterion-Referenced

  • Norms:
    • Comparing against a reference population
  • Criteria:
    • Comparing against a standard
  • Norms benefit from maximal discrepancy
  • Criteria from maximal precision

Types of Stimuli & Responses

  • Absolute vs. Comparative
    • We’re naturally better at comparative judgements
    • But the “anchors” used for comparison matter

Types of Stimuli & Responses (cont.)

  • Preference vs. Similarity
    • Preference assumes dominance
      • “This” is better/more/etc. than “that”
    • Thus can use asymmetry to test validity
      • If A > B and B > C
      • Then can test if A > C
    • Similarity, however, is symmetric

Some Other Types of Measurements

  • Qualitative
    • Good for understanding one event / individual
    • Without primary concern for referencing
  • Quantitative
    • Good for comparisons
      • Esp. against norms or criteria

Some Other Types of Measurements (cont.)

  • Behavioral Competencies
    • Aptitude: Propensity to succeed in a given domain
    • Achievement: Actual success in a specific, goal-direct task
    • Diagnoses: Level of achievement success / failure
      • Moderators / mediators to that success

Some Other Types of Measurements (cont.)

Aspect Aptitude Achievement Diagnosis
Focus Potential for future learning or performance Acquired knowledge or skills Identifying causes of difficulties or conditions
Purpose Predict future success or capacity Evaluate what has been learned Guide intervention or treatment
Timeframe Future-oriented Past-oriented Present or ongoing
Assessment Type Standardized aptitude tests Academic tests, skill-based assessments Specialized diagnostic tools
Examples IQ test, nursing school entrance tests, critical thinking assessments NCLEX-RN, specialty certifications, annual skill validation Burnout inventories, dexterity evaluations, or clinical reasoning diagnostics
Interpretation Indicates readiness or potential Reflects accomplishments or mastery Identifies specific issues or disorders
Utility Career and education planning Evaluating progress or meeting standards Tailoring interventions or accommodations

A Brief History
of Modern Measurement

The Exciting Origins
of Modern Measurement

  • Borne most directly from psychophysics
    • Study of relationships between actual and perceived intensities
    • Pioneered by Gustav Fechner
      • Attempting to find a measure of the mind
      • I.e., the basic building block of thought

Exciting Origins (cont.)

  • Fechner (and his protégés) studied:
    • Minimally-noticeable stimuli
      • “Absolute thresholds”
    • Minimally-discriminable stimuli
      • “Just-noticeable differences”

Absolute Threshold

  • Fechner found that constant stimuli were not constantly noticed
    • There was “noise” in the responses
    • This noise is now typically—and, if unbiased, quite accurately—modeled as normally-distributed error
  • An “ogive”
  • Also used to represent a cumulative normal distribution

Just-Noticeable Difference (JND)

  • Tested ability to tell if two stimuli were different
    • E.g., 2 weights as heavy, 2 lights as bright
  • Found that the JND increased with stimulus intensity
    • It becomes harder to detect the
      same difference as the stimulus
      intensity increases
    • Typically called the Weber-
      Fechner law

Using Differences to Create a Scale

  • Thurstone’s law of comparative judgement (Nunnally & Bernstein, p. ~26)
    1. Participants rate preferences to a series of stimuli, presented in pairs
    2. Proportion of preference is used to estimate distance between those 2 stimuli
    3. These distances are used to create an interval scale between all stimuli

General Concepts of Good Measurement

“Rules” of Good Measurement

  • Standardization
    • Creating a standard set of rules
      • (Thus related to scaling)
    • That facilitate cross-comparisons & generalizations
      • Between researchers, populations, etc.
    • Issue is usually not accuracy, but validity

“Rules” of Good Measurement (cont.)

  • Good standardization rules are:
    1. Clear & unambiguous
      • Mathematical functions work well
    2. Easy & practical to apply
      • Helps with validity
    3. Objective
      • Rely little on the measurer, also helps validity

“Rules” of Good Measurement (end)

  • Note how those rules apply as well to instrument use & administration
    • Clear & unambiguous instructions
    • Easily-followed response formats
    • That allow as objective a response as possible

Likert Response Formats

How Likert Concieved Them

Hmph!

How Likert Concieved Them (cont.)

  1. Construct a series of Likert response format items measuring positive attitudes towards target
  2. Construct an equal number of items measuring negative attitudes
  3. Q sort both directions to assess measurement of whole range
  4. Use overall scores as interval-level measures

How Likert Concieved Them (end)

  • Individual items were intended to be ordinal
    • But not intended to be interpretted alone
  • Indeed, the concept of an overall Likert scale is almost meaningless
    • Nearly always, it’s items that employ a Likert response format
    • Except for domains easily measured with one item

Likert Response Tendencies

  • Respondents tend to assume equal distancing between categories (Westland, 2022)
    • Unless, e.g.:
      • Distances are strongly & explicitly stated
      • Items prompts or response anchors measure very strong/polarized beliefs
      • Response range is censored or levels binned
      • The scale includes a very wide range
        • Reminiscent of absolute thresholds

Likert Response Tendencies (cont.)

  • Combined (e.g., summed) Likert-scaled items tend to act even more like an interval scale (Carifio & Perla, 2007)
    • Including approximating a normal distribution
    • This also tends to hold for dichotomous items
  • Thus Likert-response data can typically be treated as interval
    • And relevant parametric tests employed
  • And you should avoid item-level analyses anyway

Yes, you!

Likert Response Tendencies (cont.)

  • Kusmaryono et al. (2022)
    • 5+ response levels may generate more uniform patterns
    • Odd-numbered scales:
      • May generate more reliable & valid data
      • At the expense of respondents tended towards middle responses

Likert Response Tendencies (cont.)

  • Weijters et al. (2010)
    • Studied marketing instruments administered online to the general (male) public
    • Investigated response formats on:
      • Net acquiescence response (NAR)
      • Extreme response (ER)
      • Misresponse to reversed items (MR)

Likert Response Tendencies (cont.)

Effect Adding Neutral Midpoint Labeling All Response Levels1 Adding More Response Levels
NAR 2 n.d.
ER
MR n.d.3
  1. Labeling all response levels did not sig. interact with adding a midpoint
  2. Negative responses moved to neutral
  3. Also having only endpoints MR, but may improve criterion validity
    • Having a midpoint interacts with number of response levels to also MR

Likert Response Tendencies (end)

  • Adding more response levels:
    • Doesn’t change NARs
    • Reduces ER
    • Doesn’t change MR, but:
      • Also having only endpoints increases MR
        • But may improve criterion validity
      • Having a midpoint interacts with number of response levels to also increase MR

The End