Skip to main content

Table 1 Definitions and Criteria for Psychometric Measurement

From: Psychometric evaluation of the respiratory syncytial virus infection, intensity and impact questionnaire (RSV-iiiQ) in adults

Property and Definition

Test and Criteria

Distribution of scores

Standard descriptive statistics to characterize average scores and variability and identify unanticipated response anomalies

Means (and medians, modes) and standard deviations (and score minimums and maximums) should be within acceptable ranges; patterns of scores should be as expected

Frequencies of answers to each question should not be extremely skewed, i.e., many “best” or “worst” scores

Structure

The relationships among questions and the extent to which they belong together for scoring purposes

Inter-item correlations should be positive, ranging from approximately 0.30 to 0.80

Item-total correlations should be positive and ≥ 0.30

Internal consistency/Cronbach’s alphas between 0.70 and 0.95 [33]

Factor analysis model fit

Factor loadings ≥ 0.30

Comparative fit index (CFI) > 0.95 [34, 35]

Standardized root mean square residual (SRMR) < 0.06

Root mean square error of approximation (RMSEA) < 0.05 [28, 36]

Tucker-Lewis Index (TLI) > 0.95 [35, 37]

Test-retest reliability

Stability of scores over time when no change is expected in the concept of interest

For categorical scores, kappa coefficients ≥ 0.21 indicate fair agreement [38]

For continuous scores, intraclass correlation coefficients > 0.70 [25, 39]

Known groups validity

The degree to which scores can distinguish among known groups hypothesized a priori to be different

Scores should be able to distinguish among groups hypothesized to be different [21], for example, scores should be statistically better among groups of patients with less severe disease

Construct validity

Evidence that relationships among scores conform to a priori hypotheses regarding logical relationships that should exist with other measures or characteristics of patients

The extent to which observed correlations among measures match hypothesized correlations in terms of sign and magnitude. Criteria for acceptability depend on the degree of conceptual similarity between the scores of interest and other instruments.

A moderate (r = 0.30 to 0.49) or strong (r ≥ 0.50) correlation [40] is considered evidence of convergent construct validity; small (r = 0.10 to 0.29) or trivial (r < 0.10) correlations do not generally provide evidence of construct validity

Responsiveness

Evidence that scores are capable of detecting change

Effect size (ES) estimates (calculated as: [change from day 1 to day 2] ÷ [day 1 SD]) and standardized response means (SRMs) show change over time

Large (ES or SRM ≥ 0.80), moderate (ES or SRM = approximately 0.50), small (ES or SRM ≤ 0.20) [40]

Observed score changes should be statistically different from 0, tested with paired t tests