Published March 3, 2015 | Version v1
Journal article Open

PRIMAL: Fast and Accurate Pedigree-based Imputation from Sequence Data in a Founder Population

Description

Founder populations and large pedigrees offer many well-known advantages for genetic mapping studies, including cost-efficient study designs. Here, we describe PRIMAL (PedigRee IMputation ALgorithm), a fast and accurate pedigree-based phasing and imputation algorithm for founder populations. PRIMAL incorporates both existing and original ideas, such as a novel indexing strategy of Identity-By-Descent (IBD) segments based on clique graphs. We were able to impute the genomes of 1,317 South Dakota Hutterites, who had genome-wide genotypes for ~300,000 common single nucleotide variants (SNVs), from 98 whole genome sequences. Using a combination of pedigree-based and LD-based imputation, we were able to assign 87% of genotypes with >99% accuracy over the full range of allele frequencies. Using the IBD cliques we were also able to infer the parental origin of 83% of alleles, and genotypes of deceased recent ancestors for whom no genotype information was available. This imputed data set will enable us to better study the relative contribution of rare and common variants on human phenotypes, as well as parental origin effect of disease risk alleles in >1,000 individuals at minimal cost.

Data availability

Because of the unique composition of the Hutterite pedigree, the identity of many families (and therefore individuals) could be determined with genome-wide SNP data that would allow reconstruction of the pedigree, and this would breach confidentiality and compliance with the protocol approved by the research ethics board. The algorithm/method described in this paper produces genome-wide data of this sort. There is a plan to release the variants from the sequence data in the future.

Files

journal.pcbi.1004139.pdf

Files (2.6 MB)

Name Size Download all
Article
md5:016e8677e169c66be167bdd5bb1adad6
675.7 kB Preview Download
md5:6dba15557f22fc0efb93bd6f1176ea6e
1.9 MB Preview Download

Additional details

Identifiers

DOI
10.1371/journal.pcbi.1004139
Other
oai:uchicago.tind.io:10288

Funding

National Institutes of Health
R01 HL085197
National Institutes of Health
R01 HD21244
National Institutes of Health
R 01 HG002899
University of Chicago and Argonne National Laboratory
Computation Institute and the Biological Sciences Division

UChicago Information

Division(s)
Biological Sciences Division
Department(s)
Human Genetics, Medicine, Statistics