Winner's Curse Correction and Variable Thresholding Improve Performance of Polygenic Risk Modeling Based on Genome-Wide Association Study Summary-Level Data
Recent heritability analyses have indicated that genome-wide association studies (GWAS) have the potential to improve genetic risk prediction for complex diseases based on polygenic risk score (PRS), a simple modelling technique that can be implemented using summary-level data from the discovery samples. We herein propose modifications to improve the performance of PRS. We introduce threshold-dependent winner’s-curse adjustments for marginal association coefficients that are used to weight the single-nucleotide polymorphisms (SNPs) in PRS. Further, as a way to incorporate external functional/annotation knowledge that could identify subsets of SNPs highly enriched for associations, we propose variable thresholds for SNPs selection. We applied our methods to GWAS summary-level data of 14 complex diseases. Across all diseases, a simple winner’s curse correction uniformly led to enhancement of performance of the models, whereas incorporation of functional SNPs was beneficial only for selected diseases. Compared to the standard PRS algorithm, the proposed methods in combination led to notable gain in efficiency (25–50% increase in the prediction R2) for 5 of 14 diseases. As an example, for GWAS of type 2 diabetes, winner’s curse correction improved prediction R2 from 2.29% based on the standard PRS to 3.10% (P = 0.0017) and incorporating functional annotation data further improved R2 to 3.53% (P = 2×10−5). Our simulation studies illustrate why differential treatment of certain categories of functional SNPs, even when shown to be highly enriched for GWAS-heritability, does not lead to proportionate improvement in genetic risk-prediction because of non-uniform linkage disequilibrium structure.
Details
Title
Winner's Curse Correction and Variable Thresholding Improve Performance of Polygenic Risk Modeling Based on Genome-Wide Association Study Summary-Level Data
Author
Shi, Jianxin : National Cancer Institute Park, Ju-Hyun : Dongguk University : (http://orcid.org/0000-0001-9675-6475) Duan, Jubao : University of Chicago Berndt, Sonja T. : National Cancer Institute Moy, Winton : Northern Illinois University Yu, Kai : National Cancer Institute Song, Lei : National Cancer Institute Wheeler, William : Information Management Services, Inc. Hua, Xing : National Cancer Institute Silverman, Debra : National Cancer Institute Garcia-Closas, Montserrat : National Cancer Institute Hsiung, Chao Agnes : National Health Research Institutes Figueroa, Jonine D. : National Cancer Institute Cortessis, Victoria K. : University of Southern California Malats, Núria : Spanish National Cancer Research Centre Karagas, Margaret R. : Dartmouth College Vineis, Paolo : Imperial College London Chang, I-Shou : National Institute of Cancer Research Sanders, Alan R. : University of Chicago Gejman, Pablo : University of Chicago
The GWAS genotype data are not publicly available for the purpose of protecting patient privacy. Summary-level data or genotype data can be applied for from DbGaP or specific GWAS consortium. Access to WTCCC data is available by application to the Wellcome Trust Case Control Consortium Data Access Committee following the link https://www.sanger.ac.uk/legal/DAA/MasterController. Access to the GWAS of pancreatic cancer can be applied for through the PanC4 consortium (Email: eduell@iconcologia.net; Website: www.panc4.org). Access to the colorectal cancer GWAS data can be applied for through GECCO Consortium (Genetics and Epidemiology of Colorectal Cancer Consortium) (Dr. Ulrike Peters, Member Fred Hutchinson Cancer Research Center. Email: upeters@fhcrc.org). Summary level data for European lung cancer can be applied for from the TRICL consortium (Transdisciplinary Research in Cancer of the Lung) (Dr. Christopher I Amos, Norris Cotton Cancer Center, Dartmouth College. Email: Christopher.I.Amos@dartmouth.edu). Summary level data for prostate cancer GWAS can be applied for from the PRACTICAL consortium (Prostate Cancer Association Group to Investigate Cancer Associated Alterations in the Genome. Website: http://practical.ccge.medschl.cam.ac.uk/) and the GAME-ON/ELLIPSE consortium (Elucidating Loci Involved in Prostate Cancer Susceptibility. Website: http://epi.grants.cancer.gov/gameon/index.html). Access to the following GWAS individual-level data can be applied for through the dbGaP website (https://www.ncbi.nlm.nih.gov/gap): Female Lung Cancer Consortium in Asia (FLCCA), phs000716.v1.p1; bladder cancer, phs000346.v1.p1; Molecular Genetics of Schizophrenia, phs000167.v1.p1; Genetic Epidemiology Research on Adult Health and Aging (GERA), phs000674.v1.p1; Lung cancer GWAS in EAGLE (Environment and Genetics in Lung Cancer Etiology Study), phs000093.v2.p2.
Funding Information
National Institutes of Health, Intramural Research program National Institutes of Health, U19 CA148127 National Cancer Institute, U01 CA137088 National Cancer Institute, R01 CA059045 Regional Council of Pays de la Loire Groupement des Entreprises Françaises dans la Lutte contre le Cancer Association Anne de Bretagne Génétique and the Ligue Régionale Contre le Cancer National Institutes of Health, R01 CA60987 German Research Council, BR 1704/6-1 German Research Council, BR 1704/6-3 German Research Council, BR 1704/6-4 German Research Council, CH 117/1-1 German Federal Ministry of Education and Research, 01KH0404 German Federal Ministry of Education and Research, 01ER0814 National Institutes of Health, R01 CA48998 National Institutes of Health, P01 CA 055075 National Institutes of Health, UM1 CA167552 National Institutes of Health, R01 137178 National Institutes of Health, R01 CA151993 National Institutes of Health, P50 CA127003 National Institutes of Health, UM1 CA186107 National Institutes of Health, R01 CA137178 National Institutes of Health, P01 CA87969 National Institutes of Health, R01 CA151993 National Institutes of Health, P50 CA127003 National Institutes of Health, R01 CA042182 National Institutes of Health, R37 CA54281 National Institutes of Health, P01 CA033619 National Institutes of Health, R01 CA63464 National Institutes of Health, U01 CA074783 Ontario Research Fund Canadian Institutes of Health Research Ontario Institute for Cancer Research Ontario Ministry of Research and Innovation National Institutes of Health, R01 CA076366 National Institutes of Health, K05 CA154337 National Heart, Lung, and Blood Institute, HHSN268201100046C National Heart, Lung, and Blood Institute, HHSN268201100001C National Heart, Lung, and Blood Institute, HHSN268201100002C National Heart, Lung, and Blood Institute, HHSN268201100003C National Heart, Lung, and Blood Institute, HHSN268201100004C National Heart, Lung, and Blood Institute, HHSN271201100004C
Publication Date
2016-12-30
Language
English
Copyright Statement
This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.