Published November 17, 2006 | Version v1
Journal article Open

An Integrative Genomic Approach to Uncover Molecular Mechanisms of Prokaryotic Traits

  • 1. University of Chicago
  • 2. Yale University

Description

With mounting availability of genomic and phenotypic databases, data integration and mining become increasingly challenging. While efforts have been put forward to analyze prokaryotic phenotypes, current computational technologies either lack high throughput capacity for genomic scale analysis, or are limited in their capability to integrate and mine data across different scales of biology. Consequently, simultaneous analysis of associations among genomes, phenotypes, and gene functions is prohibited. Here, we developed a high throughput computational approach, and demonstrated for the first time the feasibility of integrating large quantities of prokaryotic phenotypes along with genomic datasets for mining across multiple scales of biology (protein domains, pathways, molecular functions, and cellular processes). Applying this method over 59 fully sequenced prokaryotic species, we identified genetic basis and molecular mechanisms underlying the phenotypes in bacteria. We identified 3,711 significant correlations between 1,499 distinct Pfam and 63 phenotypes, with 2,650 correlations and 1,061 anti-correlations. Manual evaluation of a random sample of these significant correlations showed a minimal precision of 30% (95% confidence interval: 20%–42%; n = 50). We stratified the most significant 478 predictions and subjected 100 to manual evaluation, of which 60 were corroborated in the literature. We furthermore unveiled 10 significant correlations between phenotypes and KEGG pathways, eight of which were corroborated in the evaluation, and 309 significant correlations between phenotypes and 166 GO concepts evaluated using a random sample (minimal precision = 72%; 95% confidence interval: 60%–80%; n = 50). Additionally, we conducted a novel large-scale phenomic visualization analysis to provide insight into the modular nature of common molecular mechanisms spanning multiple biological scales and reused by related phenotypes (metaphenotypes). We propose that this method elucidates which classes of molecular mechanisms are associated with phenotypes or metaphenotypes and holds promise in facilitating a computable systems biology approach to genomic and biomedical research.

Files

journal.pcbi.0020159.pdf

Files (955.9 kB)

Name Size Download all
Article
md5:c1c783fcb95754241ee9dcc9c3bf3dfd
834.6 kB Preview Download
md5:ba4f23f4eec0e411fbe126d8f283be7b
121.3 kB Preview Download

Additional details

Identifiers

DOI
10.1371/journal.pcbi.0020159
Other
oai:uchicago.tind.io:10214

Funding

National Library of Medicine
Semantic Approaches to Phenotypic Database Analysis
National Cancer Institute
National Center for the Multiscale Analysis of Genomic and Cellular Networks (MAGNet)
National Institute for Allergies and Infectious Diseases
5U54 AI057158–02
National Institutes of Health
Ruth L. Kirschtein Postdoctoral fellowship

UChicago Information

Division(s)
Biological Sciences Division
Department(s)
Human Genetics, Medicine
Center(s) or Institute(s)
Center for Biomedical Informatics