Nearly all high-throughput 'omic' data are influenced by technical and biological factors unknown to the researcher, which, if unaccounted for, can severely obfuscate estimation of and inference on the effects of interest. While the importance of this problem has precipitated the development of many methods that attempt to correct for these latent factors, most are designed for gene expression data and are not amenable for modern, complex experimental designs. In this thesis, we develop novel and provably accurate methodology to estimate and perform inference on the coefficients of interest in a multivariate linear model in the presence of latent covariates. Chapter 2 discusses this problem in the context of DNA methylation in which latent cell type typically confounds the covariate of interest. We then provide the first methods amenable to experimental designs with complex sample correlation structures in Chapters 3 and 4. Lastly, motivated by untargeted LC-MS metabolomic data, we present the first method to account for both unobserved covariates and non-random missing data in Chapter 5.




Downloads Statistics

Download Full History