Published April 14, 2021 | Version v1
Journal article Open

Supervised extraction of near-complete genomes from metagenomic samples: A new service in PATRIC

  • 1. University of Chicago
  • 2. Fellowship for Interpretation of Genomes

Description

Large amounts of metagenomically-derived data are submitted to PATRIC for analysis. In the future, we expect even more jobs submitted to PATRIC will use metagenomic data. One in-demand use case is the extraction of near-complete draft genomes from assembled contigs of metagenomic origin. The PATRIC metagenome binning service utilizes the PATRIC database to furnish a large, diverse set of reference genomes. We provide a new service for supervised extraction and annotation of high-quality, near-complete genomes from metagenomically-derived contigs. Reference genomes are assigned to putative draft genome bins based on the presence of single-copy universal marker roles in the sample, and contigs are sorted into these bins by their similarity to reference genomes in PATRIC. Each set of binned contigs represents a draft genome that will be annotated by RASTtk in PATRIC. A structured-language binning report is provided containing quality measurements and taxonomic information about the contig bins. The PATRIC metagenome binning service emphasizes extraction of high-quality genomes for downstream analysis using other PATRIC tools and services. Due to its supervised nature, the binning service is not appropriate for mining novel or extremely low-coverage genomes from metagenomic samples.

Data availability

2The PATRIC binning service is available via web frontend for PATRIC users at 2 https://patricbrc.org/app/MetagenomicBinning. The source code for the binning service can be found at 3 https://github.com/SEEDtk/p3_code/tree/master/scripts, and the PATRIC genome data which is used for 3 computing reference genomes is publicly available at https://patricbrc.org/ or ftp://ftp.patricbrc.org/. All data for benchmark studies is identified by SRA accession number. We used the following experiments: ERR1136887, ERR1398081, ERR260232, ERR321564, ERR525795, ERR526044, ERR527062, ERR528311, ERR911992, ERR912091, ERR912124, SRR060006, SRR1950750, SRR1950766, SRR341647, SRR341697, SRR413750, SRR4305113, SRR4408221, SRR5091568, SRR5127609, SRR5279233.

Files

journal.pone.0250092.pdf

Files (4.0 MB)

Name Size Download all
Article
md5:a723014b347fabe68d7439fac24f87b8
2.4 MB Preview Download
md5:04833f9cf9a8108bea965ac3d610dee8
298.2 kB Preview Download
Figures
md5:7ee6e146b75f482c4df2ad14cd8f854c
1.3 MB Preview Download

Additional details

Identifiers

DOI
10.1371/journal.pone.0250092
Other
oai:uchicago.tind.io:5953

Funding

National Institute of Allergy and Infectious Diseases
HHSN272201400027C

UChicago Information

Division(s)
Physical Sciences Division
Department(s)
Computer Science