Published September 16, 2010 | Version v1
Journal article Open

Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis

  • 1. University of Chicago

Description

We consider the statistical analysis of population structure using genetic data. We show how the two most widely used approaches to modeling population structure, admixture-based models and principal components analysis (PCA), can be viewed within a single unifying framework of matrix factorization. Specifically, they can both be interpreted as approximating an observed genotype matrix by a product of two lower-rank matrices, but with different constraints or prior distributions on these lower-rank matrices. This opens the door to a large range of possible approaches to analyzing population structure, by considering other constraints or priors. In this paper, we introduce one such novel approach, based on sparse factor analysis (SFA). We investigate the effects of the different types of constraint in several real and simulated data sets. We find that SFA produces similar results to admixture-based models when the samples are descended from a few well-differentiated ancestral populations and can recapitulate the results of PCA when the population structure is more "continuous," as in isolation-by-distance models.

Files

journal.pgen.1001117.pdf

Files (1.3 MB)

Name Size Download all
Article
md5:03b9055d5cbd75f54095128a432f76dc
879.4 kB Preview Download
md5:a71274e5711589dd66e6da5ee7c77f65
396.6 kB Preview Download

Additional details

Identifiers

DOI
10.1371/journal.pgen.1001117
Other
oai:uchicago.tind.io:10687

Funding

Kathryn and George Gould
Bioinformatics Research Development Fund
National Institutes of Health
HG002585

UChicago Information

Division(s)
Physical Sciences Division
Department(s)
Computer Science, Human Genetics, Statistics