Files
Abstract
Genome-wide association studies (GWAS) have identified thousands of genetic associations with complex human traits and diseases, yet most of them map to non-coding regions of the genome, making it difficult to identify causal variants, understand their regulatory effects, and link them to the genes through which they act. This thesis addresses these challenges through computational approaches aimed at improving post-GWAS interpretation at both the variant and gene levels. First, it leverages DNA sequence–based deep learning models to prioritize functional non-coding variants and characterize their regulatory effects in disease-relevant cellular contexts. By evaluating predicted effects in neuropsychiatric cell types, this work demonstrates that predicted regulatory effects capture biologically meaningful signals and can be used to prioritize likely causal variants within GWAS loci. Second, this thesis develops a statistical genetics framework that integrates multi-omics and multi-tissue QTL data with GWAS summary statistics to identify causal genes underlying complex traits. The proposed method jointly models multiple molecular traits while accounting for correlations among them, enabling gene-level fine-mapping. In simulations and real data analyses, this framework substantially reduces false positive discoveries compared to existing approaches and increases power to identify candidate genes. Together, these approaches advance the functional interpretation of GWAS by bridging the gap between non-coding genetic variations and complex traits, providing scalable computational tools for prioritizing regulatory variants and risk genes, elucidating underlying molecular mechanisms and revealing genetic architecture of complex human traits.