Filename Size Access Description License


Microbiome refers to the full collection of all microorganisms in a community. Recent advances in sequencing technologies have allowed scientists to quantify the microbiome compositions at an unprecedented resolution. However, high-throughput sequencing data have unique characteristics such as high dimensionality, sparseness, and compositional nature. Moreover, the phylogenetic tree that quantifies the evolutionary similarity among all taxa offers unique modelling challenges. This thesis presents novel statistical models to analyze microbiome data by leveraging these unique characteristics. The first problem we consider is testing equality of microbial compositions among groups of populations. We apply the Dirichlet-tree multinomial distribution (DTM), a generalization of the traditional Dirichlet-multinomial (DM) distribution, and design a scan statistic that takes advantage of the phylogenetic relations among taxa. We provide an upper bound of p-value using this scan statistic and show that this method has improved power in an empirical dataset and simulation. The second problem continues the investigation of DTM vs DM by introducing a penalization method that selects the best model along the DM-DTM spectrum. The last problem we address is estimating heritability when the input is a matrix of pairwise dissimilarities calculated from beta diversities. Beta diversity is an ecologically meaningful way to measure pairwise compositional differences by taking variations in all taxa into account. We extend the traditional ACE variance component model to the matrix case using Wishart distribution. We also present a new beta diversity measurement, named root-Unifrac, that matches the positive definiteness requirement of the Wishart distribution. This Wishart ACE model allows us to directly measure community heritability, which quantifies the contribution of additive genetics to overall variations in beta diversity.


Additional Details


Download Full History