Property and Definition | Test and Criteria |
---|---|
Distribution of scores Standard descriptive statistics to characterize average scores and variability and identify unanticipated response anomalies | Means (and medians, modes) and standard deviations (and score minimums and maximums) should be within acceptable ranges; patterns of scores should be as expected Frequencies of answers to each question should not be extremely skewed, i.e., many “best” or “worst” scores |
Structure The relationships among questions and the extent to which they belong together for scoring purposes | Inter-item correlations should be positive, ranging from approximately 0.30 to 0.80 |
Item-total correlations should be positive and ≥ 0.30 | |
Internal consistency/Cronbach’s alphas between 0.70 and 0.95 [33] | |
Factor analysis model fit Factor loadings ≥ 0.30 Comparative fit index (CFI) > 0.95 [34, 35] Standardized root mean square residual (SRMR) < 0.06 Root mean square error of approximation (RMSEA) < 0.05 [28, 36] | |
Test-retest reliability Stability of scores over time when no change is expected in the concept of interest | For categorical scores, kappa coefficients ≥ 0.21 indicate fair agreement [38] For continuous scores, intraclass correlation coefficients > 0.70 [25, 39] |
Known groups validity The degree to which scores can distinguish among known groups hypothesized a priori to be different | Scores should be able to distinguish among groups hypothesized to be different [21], for example, scores should be statistically better among groups of patients with less severe disease |
Construct validity Evidence that relationships among scores conform to a priori hypotheses regarding logical relationships that should exist with other measures or characteristics of patients | The extent to which observed correlations among measures match hypothesized correlations in terms of sign and magnitude. Criteria for acceptability depend on the degree of conceptual similarity between the scores of interest and other instruments. A moderate (r = 0.30 to 0.49) or strong (r ≥ 0.50) correlation [40] is considered evidence of convergent construct validity; small (r = 0.10 to 0.29) or trivial (r < 0.10) correlations do not generally provide evidence of construct validity |
Responsiveness Evidence that scores are capable of detecting change | Effect size (ES) estimates (calculated as: [change from day 1 to day 2] ÷ [day 1 SD]) and standardized response means (SRMs) show change over time Large (ES or SRM ≥ 0.80), moderate (ES or SRM = approximately 0.50), small (ES or SRM ≤ 0.20) [40] Observed score changes should be statistically different from 0, tested with paired t tests |