Psychometric evaluation of the respiratory syncytial virus infection, intensity and impact questionnaire (RSV-iiiQ) in adults

Williams, Valerie; DeMuro Romano, Carla; Finelli, Lyn; Qin, Shanshan; Saretsky, Todd L.; Ma, Jia; Lewis, Sandy; Phillips, Matthew; Osborne, Richard H.; Norquist, Josephine M.

doi:10.1186/s12955-023-02174-2

Table 1 Definitions and Criteria for Psychometric Measurement

From: Psychometric evaluation of the respiratory syncytial virus infection, intensity and impact questionnaire (RSV-iiiQ) in adults

Property and Definition	Test and Criteria
Distribution of scores Standard descriptive statistics to characterize average scores and variability and identify unanticipated response anomalies	Means (and medians, modes) and standard deviations (and score minimums and maximums) should be within acceptable ranges; patterns of scores should be as expected Frequencies of answers to each question should not be extremely skewed, i.e., many “best” or “worst” scores
Structure The relationships among questions and the extent to which they belong together for scoring purposes	Inter-item correlations should be positive, ranging from approximately 0.30 to 0.80
	Item-total correlations should be positive and ≥ 0.30
	Internal consistency/Cronbach’s alphas between 0.70 and 0.95 [33]
	Factor analysis model fit Factor loadings ≥ 0.30 Comparative fit index (CFI) > 0.95 [34, 35] Standardized root mean square residual (SRMR) < 0.06 Root mean square error of approximation (RMSEA) < 0.05 [28, 36] Tucker-Lewis Index (TLI) > 0.95 [35, 37]
Test-retest reliability Stability of scores over time when no change is expected in the concept of interest	For categorical scores, kappa coefficients ≥ 0.21 indicate fair agreement [38] For continuous scores, intraclass correlation coefficients > 0.70 [25, 39]
Known groups validity The degree to which scores can distinguish among known groups hypothesized a priori to be different	Scores should be able to distinguish among groups hypothesized to be different [21], for example, scores should be statistically better among groups of patients with less severe disease
Construct validity Evidence that relationships among scores conform to a priori hypotheses regarding logical relationships that should exist with other measures or characteristics of patients	The extent to which observed correlations among measures match hypothesized correlations in terms of sign and magnitude. Criteria for acceptability depend on the degree of conceptual similarity between the scores of interest and other instruments. A moderate (r = 0.30 to 0.49) or strong (r ≥ 0.50) correlation [40] is considered evidence of convergent construct validity; small (r = 0.10 to 0.29) or trivial (r < 0.10) correlations do not generally provide evidence of construct validity
Responsiveness Evidence that scores are capable of detecting change	Effect size (ES) estimates (calculated as: [change from day 1 to day 2] ÷ [day 1 SD]) and standardized response means (SRMs) show change over time Large (ES or SRM ≥ 0.80), moderate (ES or SRM = approximately 0.50), small (ES or SRM ≤ 0.20) [40] Observed score changes should be statistically different from 0, tested with paired t tests

Back to article page

ISSN: 1477-7525

Contact us

Submission enquiries: journalsubmissions@springernature.com

Health and Quality of Life Outcomes

Contact us