Files

Abstract

High-throughput sequencing (HTS) techniques such as RNA-seq, ChIP-seq and ATAC-seq have enabled researchers to investigate complex biological processes in unprecedented detail. One common feature of HTS data is that they often consist of counts. For example, in RNA-seq, the counts typically represent the number of times a RNA molecule has been sequenced and are a proxy for the expression level. Recently, the advent of single-cell sequencing techniques such as scRNA-seq and scATAC-seq has unveiled the transcriptome at cell-level resolution. However, the single-cell count data are sparse and come with high levels of technical noise. With the emergence of large, sparse and noisy sequencing data, there is a need for rigorous statistical methods that can accurately model these counts. On the other hand, due to the complex structure of the sequencing data exhibited, the statistical methods developed for the data should be flexible enough to incorporate different assumptions and structural information. For instance, matrix factorization has been extensively employed to uncover the latent structure of gene expression across a variety of cell types. The incorporation of sparsity assumptions into these latent structures has been shown to yield a more parsimonious representation and enhance the interpretability of results. Consequently, it would be beneficial to integrate sparsity assumptions when modeling the structure of sequencing data. In this thesis, we focus on developing flexible empirical Bayes (EB) methods for statistical modeling and inference in the field of genomics. We first explore EB Poisson mean models as a fundamental component for developing sophisticated models and as a simple problem for evaluating different approaches. Then we study EB smoothing methods that can account for extra variation or over-dispersion in sequencing data, and apply the methods to visualize gene expression patterns along the genome. We further introduce a general variational inference method for non-Gaussian data, and develop an EB Poisson matrix factorization method, with applications to single cell RNA sequencing data. Finally, we extend Poisson non-negative matrix factorization methodologies to accommodate spatially-structured or sparse factors and loadings.

Details

Actions

PDF

from
to
Export
Download Full History