Published October 14, 2021 | Version v1
Journal article Open

Universal risk phenotype of US counties for flu-like transmission to improve county-specific COVID-19 incidence forecasts

  • 1. University of Chicago

Description

The spread of a communicable disease is a complex spatio-temporal process shaped by the specific transmission mechanism, and diverse factors including the behavior, socio-economic and demographic properties of the host population. While the key factors shaping transmission of influenza and COVID-19 are beginning to be broadly understood, making precise forecasts on case count and mortality is still difficult. In this study we introduce the concept of a universal geospatial risk phenotype of individual US counties facilitating flu-like transmission mechanisms. We call this the Universal Influenza-like Transmission (UnIT) score, which is computed as an information-theoretic divergence of the local incidence time series from an high-risk process of epidemic initiation, inferred from almost a decade of flu season incidence data gleaned from the diagnostic history of nearly a third of the US population. Despite being computed from the past seasonal flu incidence records, the UnIT score emerges as the dominant factor explaining incidence trends for the COVID-19 pandemic over putative demographic and socio-economic factors. The predictive ability of the UnIT score is further demonstrated via county-specific weekly case count forecasts which consistently outperform the state of the art models throughout the time-line of the COVID-19 pandemic. This study demonstrates that knowledge of past epidemics may be used to chart the course of future ones, if transmission mechanisms are broadly similar, despite distinct disease processes and causative pathogens.

Data availability

With the exception of Truven MarketScan, the sources are in the public domain. Data on confirmed cases of COVID-19 were compiled and released at the COVID-19 Data Repository by the Center for Systems Science and Engineering at Johns Hopkins University (https://github.com/CSSEGISandData/COVID-19). The John Hopkins COVID-19 data represent data collated by the US Centers for Disease Control & Prevention (CDC) from individual states and local health agencies. Using the John Hopkins COVID-19 data resource, we obtained county-level confirmed new weekly case counts for all weeks up to the current point in time (2021-05-30) for 3094 US counties. We calculated COVID-19 case per capita using the 2019 population estimate provided by the US Census Bureau generated from 2010 US decennial census (https://www.census.gov/data/datasets/time-series/demo/popest/2010s-counties-detail.html). We include five demographic independent variables: 1) total population, 2) percent of the total population aged 65+, 3) percent of Hispanics in the total population, 4) percent of black/African-American in the total population, 5) percent of minority groups in the total population. For socioeconomic factors, we consider: 1) percent of the total population in poverty and 2) median household income, which are also obtained from the US Census Bureau, based on the 2010 US decennial census. This data is publicly available. Generated models are publicly available at https://github.com/zeroknowledgediscovery/unitcov, which includes the complete forecast software. (DOI: 10.5281/zenodo.5361628) The Truven dataset is a third party dataset, which the authors are not authorized to distribute publicly. The dataset can be procured by interested researchers, under license, from https://www.ibm.com/watson-health/about/truven-health-analytics. The Truven MarketScan database is a US national database collating data contributed by over 150 insurance carriers and large, self-insuring companies, contains over 4.6 billion inpatient and outpatient service claims, with over six billion diagnostic codes. We processed the Truven database to obtain the reported weekly number of influenza cases over a period of 471 weeks spanning from January 2003 to December 2011, at the spatial resolution of US counties. Standard ICD9 diagnostic codes corresponding to Influenza infection is used to determine the county-specific incidence time series, which are: 1) 487 Influenza, 2) 487.0 Influenza with pneumonia, and 3) 487.1 Influenza with other respiratory manifestations and 4) 487.8 Influenza with other manifestations.

Files

journal.pcbi.1009363.pdf

Files (6.7 MB)

Name Size Download all
Article
md5:d919764694420283fe3ead0bfe03c64b
3.7 MB Preview Download
md5:be8e570787926425122ee22a10b8fdee
2.8 MB Preview Download
Supporting information
md5:21caabee39a99eeb7dfa0276a06f8089
115.2 kB Preview Download

Additional details

Identifiers

DOI
10.1371/journal.pcbi.1009363
Other
oai:uchicago.tind.io:5910

Funding

United States Defense Advanced Research Projects Agency
HR00111890043/P00004

UChicago Information

Division(s)
Biological Sciences Division
Department(s)
Medicine
Center(s) or Institute(s)
Center for Health Statistics