Action Filename Size Access Description License


Advances in next-generation sequencing (NGS) have propelled genomics into a data-intensive science. Although sufficient hardware resources are necessary for large-scale NGS analyses, robust and scalable software is frequently the more formidable barrier. To address this need, I have been the principle contributor to the development of SwiftSeq, a modular and system-agnostic workflow for end-to-end analysis of NGS data. SwiftSeq offers significant benefits to both small- and large-scale analyses. Parallelization, synchronization, and execution site selection are managed automatically. Tasks are robust to transient software and localized hardware failures, keeping user intervention to a minimum. Analysis jobs can consistently scale to hundreds of nodes and thousands cores. Using a Cray XE6, SwiftSeq can produce annotated germline and somatic genotypes for standard depth whole exomes and genomes in approximately 36 minutes and 11 hours, respectively. SwiftSeq is freely available, and harmonized variant calls representing nearly 10,000 exomes from The Cancer Genome Atlas (TCGA) have been made available to the genomics community through the Bionimbus Protected Data Cloud.,The value of the aforementioned exome dataset is the abundance of unique biological insights it enables. I was interested in using germline cancer genetics to better understand epidemiological phenotypes, particularly age at diagnosis. 5-10% of cancers cases can be attributed to highly penetrant, inherited alleles, which often lead to earlier age at diagnosis. However, the polygenic nature of cancer risk loci and its relationship to age at diagnosis is less understood. Using 8,111 individuals from TCGA representing over 30 cancer types (> 99% solid tumors), I have shown that increased ClinVar and deleterious allele burden within ClinVar cancer risk genes is associated with earlier age at diagnosis. These findings were replicated using a second set of autosomal dominant cancer predisposition genes. Strikingly, high allele burden in breast cancer was an independent predictor of age at diagnosis, and its effect was comparable to mutations in BRCA1/2. Overall, greater levels of baseline genetic deficiencies likely render individuals more sensitive to somatic events leading to earlier tumorigenesis. Investigating individuals’ harmful alleles in aggregate could assist in clinical cancer risk assessment.,Combining the aforementioned variation with known mutational mechanisms, I was also able to identify putative cancer genes. The two-hit hypothesis asserts that many cancer risk genes require two-hits (i.e. biallelic loss) in order to promote cancerous phenotypes in cells. In the classical model, the first hit is an inherited deleterious allele, whereas the second is generated through through loss-of-heterozygosity (LOH). By jointly analyzing LOH and deleterious, germline variants across 5,146 individuals, I found that the classic tumor suppressors BRCA1, BRCA2, and ATM showed highest, pan-cancer enrichment for two-hit scenarios. Two other genes – PHLPP2 and KDELC2 – also had a preponderance of two-hits. Performing siRNA knockdowns in multiple cells lines, Mike Bolt showed that reducing PHLPP2 and KDELC2 expression promotes the cancer-like phenotypes proliferation and migration. Furthermore, malignancy-specific investigations provided strong computational and experimental evidence that ROBO1 is a novel two-hit gene in breast cancer. Overall, these analyses have shown that integrating germline and somatic genetics can reveal novel cancer genes.,Lastly, I examined how genetic background can affect the somatic mutational landscape. In breast cancer, women of African ancestry are diagnosed younger, have more clinically aggressive disease stage-for-stage, and have higher mortality rates than age-matched women of European or Asian ancestry. Using a combination of exome, genome, and RNA sequencing, Markus Riester and I examined the molecular features of breast cancers across 194 patients from Nigeria and 1,037 patients from the US in TCGA (171 Black, 753 White, 113 other). The mutational landscape and immune signature patterns differed across racial/ethnic populations. Triple Negative (43%) and HER2+ positive (25%) subtypes were enriched in Nigerians whose tumors were characterized by a higher TP53 mutation rate, increased structural variation, and greater prevalence of the homologous recombination deficiency signature. GATA3 mutations were highest in Nigerian hormone receptor positive tumors (25.9%). Higher proportions of APOBEC-mediated substitutions were strongly associated with PIK3CA and CDH1 mutations, which were more prevalent in Whites. Additionally, I identified PLK2, KDM6A, GPS2, and B2M as novel significantly mutated genes in breast cancer. These data underscore the importance of genomic research in diverse populations to accelerate progress in precision oncology and reduce global disparities in outcomes.


Additional Details


Download Full History