Published June 7, 2012 | Version v1
Journal article Open

A Platform-Independent Method for Detecting Errors in Metagenomic Sequencing Data: DRISEE

  • 1. University of Chicago
  • 2. Argonne National Laboratory

Description

We provide a novel method, DRISEE (duplicate read inferred sequencing error estimation), to assess sequencing quality (alternatively referred to as "noise" or "error") within and/or between sequencing samples. DRISEE provides positional error estimates that can be used to inform read trimming within a sample. It also provides global (whole sample) error estimates that can be used to identify samples with high or varying levels of sequencing error that may confound downstream analyses, particularly in the case of studies that utilize data from multiple sequencing samples. For shotgun metagenomic data, we believe that DRISEE provides estimates of sequencing error that are more accurate and less constrained by technical limitations than existing methods that rely on reference genomes or the use of scores (e.g. Phred). Here, DRISEE is applied to (non amplicon) data sets from both the 454 and Illumina platforms. The DRISEE error estimate is obtained by analyzing sets of artifactual duplicate reads (ADRs), a known by-product of both sequencing platforms. We present DRISEE as an open-source, platform-independent method to assess sequencing error in shotgun metagenomic data, and utilize it to discover previously uncharacterized error in de novo sequence data from the 454 and Illumina sequencing platforms.

Files

journal.pcbi.1002541.pdf

Files (547.0 kB)

Name Size Download all
Article
md5:5b474b48c02a924eae222b0309dbc4a2
508.3 kB Preview Download
md5:06207cff54f7a65a2dde49edc16e5657
38.7 kB Preview Download

Additional details

Identifiers

DOI
10.1371/journal.pcbi.1002541
Other
oai:uchicago.tind.io:10231

Funding

U.S. Department of Energy, Office of Biological and Environmental Research
DOE Systems Biology Knowledgebase
U.S. Department of Energy, Office of Biological and Environmental Research
DE-AC02-06CH11357

UChicago Information

Division(s)
Institutes & Centers
Center(s) or Institute(s)
Institute for Genomics and Systems Biology