Supervised extraction of near-complete genomes from metagenomic samples: A new service in PATRIC
- 1. University of Chicago
- 2. Fellowship for Interpretation of Genomes
Description
Large amounts of metagenomically-derived data are submitted to PATRIC for analysis. In the future, we expect even more jobs submitted to PATRIC will use metagenomic data. One in-demand use case is the extraction of near-complete draft genomes from assembled contigs of metagenomic origin. The PATRIC metagenome binning service utilizes the PATRIC database to furnish a large, diverse set of reference genomes. We provide a new service for supervised extraction and annotation of high-quality, near-complete genomes from metagenomically-derived contigs. Reference genomes are assigned to putative draft genome bins based on the presence of single-copy universal marker roles in the sample, and contigs are sorted into these bins by their similarity to reference genomes in PATRIC. Each set of binned contigs represents a draft genome that will be annotated by RASTtk in PATRIC. A structured-language binning report is provided containing quality measurements and taxonomic information about the contig bins. The PATRIC metagenome binning service emphasizes extraction of high-quality genomes for downstream analysis using other PATRIC tools and services. Due to its supervised nature, the binning service is not appropriate for mining novel or extremely low-coverage genomes from metagenomic samples.
Data availability
2The PATRIC binning service is available via web frontend for PATRIC users at 2 https://patricbrc.org/app/MetagenomicBinning. The source code for the binning service can be found at 3 https://github.com/SEEDtk/p3_code/tree/master/scripts, and the PATRIC genome data which is used for 3 computing reference genomes is publicly available at https://patricbrc.org/ or ftp://ftp.patricbrc.org/. All data for benchmark studies is identified by SRA accession number. We used the following experiments: ERR1136887, ERR1398081, ERR260232, ERR321564, ERR525795, ERR526044, ERR527062, ERR528311, ERR911992, ERR912091, ERR912124, SRR060006, SRR1950750, SRR1950766, SRR341647, SRR341697, SRR413750, SRR4305113, SRR4408221, SRR5091568, SRR5127609, SRR5279233.
Files
journal.pone.0250092.pdf
Additional details
Identifiers
- DOI
- 10.1371/journal.pone.0250092
- Other
- oai:uchicago.tind.io:5953
Related works
- Cites
- https://doi.org/10.1101/2019.12.13.874651 (URL)
Funding
- National Institute of Allergy and Infectious Diseases
- HHSN272201400027C