Skip to main content

Table 4 Area under the receiver operating characteristic curve (AUC) for depression measures detecting moderate improvement

From: Responsiveness of PROMIS and Patient Health Questionnaire (PHQ) Depression Scales in three clinical trials

Depression measure*

Average accuracy across trials

Accuracy for detecting moderate improvement

Retrospective Global Rating of Change (GRC)

Accuracy for detecting moderate improvement

Prospective Global Rating of Change (GRC)

Retro-spective GRC

Pro-spective GRC

CAMEO

SPACE

SSM

CAMEO

SPACE

SSM

AUC

(95% CI)

AUC

(95% CI)

AUC

(95% CI)

AUC

(95% CI)

AUC

(95% CI)

AUC

(95% CI)

PROMIS 4-item

.603

.751

.625

(.522–.728)

.640

(.553–.727)

.545

(.473–.617)

.773

(.650–.895)

.819

(.700–.934)

.663

(.488–.838)

PROMIS 6-item

.619

.745

.653

(.553–.753)

.645

(.556–.734)

.560

(.487–.634)

.767

(.642–.892)

.811

(.697–.926)

.657

(.491–.823)

PROMIS 8-item

.610

.751

.632

(.530–.734)

.642

(.555–.728)

.557

(.483–.631)

.751

(.619–.883)

.816

(.702–.929)

.687

(.539–.844)

PROMIS Short-form

.625

.757

.680

(.583–.777)

.638

(.551–.725)

.558

(.484–.631)

.760

(.638–.881)

.836

(.729–.942)

.676

(.514–.838)

PHQ-9

.636

.682

.625

(.526–.724)

.669

(.592–.747)

.614

(.542–.686)

.705

(.568–.841)

.660

(.532–.787)

.681

(.515–.846)

PHQ-2

.588

.631

.587

(.482–.692)

.616

(.537–.695)

.562

(.492–.632)

.609

(.455–.764)

.679

(.553–.806)

.605

(.424–.785)

SF-36 Mental Health

.580

(.473–.687)

.810

(.675–.944)

  1. *AUC is probability of correctly discriminating between patients who have improved and those who have not. Any improvement ≥ “a little better”; moderate improvement ≥ “moderately better”
  2. 6 month follow-up for CAMEO; 3 months for SPACE and SSM. The proportion of patients reporting moderate improvement by retrospective GRC was 25%, 24%, and 42% in CAMEO, SPACE, and SSM, respectively. The proportion reporting moderate improvement by prospective GRC was 13%, 11%, and 7% in CAMEO, SPACE, and SSM, respectively
  3. There were no significant differences at P < .01 (using Bonferroni’s correction for multiple comparisons) between any of the retrospective AUC’s. The prospective AUCs were significantly lower for the PHQ-9 (P = .008) and PHQ-2 (P = .004) compared to the PROMIS Short-form (with P = .01 to .02 range compared to the other PROMIS scales) in the SPACE trial and for the PHQ-2 (P = .007) compared to the SF-36 in the CAMEO trial