Published October 15, 2024 | Version v1
Journal article Open

Early Warning Scores With and Without Artificial Intelligence

  • 1. University of Chicago
  • 2. University of Wisconsin
  • 3. Yale University
  • 4. Yale New Haven Health

Description

Importance: Early warning decision support tools to identify clinical deterioration in the hospital are widely used, but there is little information on their comparative performance.

Objective: To compare 3 proprietary artificial intelligence (AI) early warning scores and 3 publicly available simple aggregated weighted scores.

Design, Setting, and Participants: This retrospective cohort study was performed at 7 hospitals in the Yale New Haven Health System. All consecutive adult medical-surgical ward hospital encounters between March 9, 2019, and November 9, 2023, were included.

Exposures: Simultaneous Epic Deterioration Index (EDI), Rothman Index (RI), eCARTv5 (eCART), Modified Early Warning Score (MEWS), National Early Warning Score (NEWS), and NEWS2 scores.

Main Outcomes and Measures: Clinical deterioration, defined as a transfer from ward to intensive care unit or death within 24 hours of an observation.

Results: Of the 362 926 patient encounters (median patient age, 64 [IQR, 47-77] years; 200 642 [55.3%] female), 16 693 (4.6%) experienced a clinical deterioration event. eCART had the highest area under the receiver operating characteristic curve at 0.895 (95% CI, 0.891-0.900), followed by NEWS2 at 0.831 (95% CI, 0.826-0.836), NEWS at 0.829 (95% CI, 0.824-0.835), RI at 0.828 (95% CI, 0.823-0.834), EDI at 0.808 (95% CI, 0.802-0.812), and MEWS at 0.757 (95% CI, 0.750-0.764). After matching scores at the moderate-risk sensitivity level for a NEWS score of 5, overall positive predictive values (PPVs) ranged from a low of 6.3% (95% CI, 6.1%-6.4%) for an EDI score of 41 to a high of 17.3% (95% CI, 16.9%-17.8%) for an eCART score of 94. Matching scores at the high-risk specificity of a NEWS score of 7 yielded overall PPVs ranging from a low of 14.5% (95% CI, 14.0%-15.2%) for an EDI score of 54 to a high of 23.3% (95% CI, 22.7%-24.2%) for an eCART score of 97. The moderate-risk thresholds provided a median of at least 20 hours of lead time for all the scores. Median lead time at the high-risk threshold was 11 (IQR, 0-69) hours for eCART, 8 (IQR, 0-63) hours for NEWS, 6 (IQR, 0-62) hours for NEWS2, 5 (IQR, 0-56) hours for MEWS, 1 (IQR, 0-39) hour for EDI, and 0 (IQR, 0-42) hours for RI.

Conclusions and Relevance: In this cohort study of inpatient encounters, eCART outperformed the other AI and non-AI scores, identifying more deteriorating patients with fewer false alarms and sufficient time to intervene. NEWS, a non-AI, publicly available early warning score, significantly outperformed EDI. Given the wide variation in accuracy, additional transparency and oversight of early warning tools may be warranted.

Data availability

See Supplement 2.

Files

edelson_2024_oi_241126_1727980018.60702.pdf

Files (1.2 MB)

Name Size Download all
Article
md5:f3a6059859ed170dc61250e9a8f92a64
933.6 kB Preview Download
md5:73d8ced0445d121d1514ba804534f933
314.9 kB Preview Download

Additional details

Identifiers

DOI
10.1001/jamanetworkopen.2024.38986
Other
oai:uchicago.tind.io:13731

Funding

National Institutes of Health
R01HL157262

UChicago Information

Division(s)
Biological Sciences Division, Pritzker School of Medicine
Department(s)
Medicine, Public Health Sciences