Variation in DNA sequence influences change in one or many molecular intermediates in a functional pathway, ultimately leading to a change in an organismal-level trait. This creates a causal chain of events, as governed by the Central Dogma of molecular biology, where deleterious DNA variants cause dysregulation of gene expression and/or protein levels, leading to a disease state at the organismal-level. Determining which and how DNA variants are causal for the disease phenotype is a major challenge in the field of genetics and is of major interest due to its potential for unraveling new knowledge about regulatory biology and discovering new genetic therapies for diseases. Single nucleotide variants (SNVs), or just variants, can be classified into two classes: namely single-nucleotide polymorphism (SNPs) which occur at some frequency in the human population, and somatic point mutations which occur throughout the lifespan of the organism. The vast majority of disease-associated variants tend to be in the non-coding part of the genome, leading to complex and variable interactions with genes. Perhaps the best understood of these non-coding variants are regulatory variants which reside in DNA regulatory elements such as promoters, enhancers and repressors. The activity of regulatory elements has been shown to be cell-type and state specific, which motivates the need for single-cell technologies for further dissecting disease-related variants and the putative genes they target. In this dissertation, I develop a framework for utilizing single-cell ‘omics data to interpret the germline SNPs and somatic point mutations associated with disease states. In Chapter 1, I explore methods for detecting somatic mutations in individual cancer cells and nominate genes whose expression is altered in cells with somatic mutations using single-cell RNA-sequencing data. However, obtaining single-cell RNA-sequencing data from bulk tissues such as solid tumors, presents its own challenges. Due to the complexity of the intracellular matrix of adult bulk tissues, such as solid tumors, obtaining single cell suspensions is not always possible. In Chapter 2, I performed a systematic analysis between single-cell and nucleus RNA-sequencing data on a model system of induced-pluripotent stem cells differentiating into cardiomyocytes. Finally, I developed a framework in Chapter 3 for utilizing single-nucleus ATAC-seq and single-nucleus RNA-seq to interpret the germline SNPs found in atrial fibrillation (AF) GWAS, the most common cardiac arrhythmia. Risk variants of Atrial Fibrillation (AF) are >10-fold enriched in cardiomyocytes (CMs) but not other cell types. Taking advantage of this enrichment pattern, we used a Bayesian statistical framework to fine-map causal variants of AF, favoring variants in CM open chromatin regions. I developed a novel computational procedure that aggregates all putative causal variants and combines multiple sources of information linking SNPs to genes. Through this procedure, I nominate genes that are not found by GWAS alone.




Downloads Statistics

Download Full History