Published January 13, 2011 | Version v1
Journal article Open

Benchmarking Ontologies: Bigger or Better?

  • 1. University of Chicago

Description

A scientific ontology is a formal representation of knowledge within a domain, typically including central concepts, their properties, and relations. With the rise of computers and high-throughput data collection, ontologies have become essential to data mining and sharing across communities in the biomedical sciences. Powerful approaches exist for testing the internal consistency of an ontology, but not for assessing the fidelity of its domain representation. We introduce a family of metrics that describe the breadth and depth with which an ontology represents its knowledge domain. We then test these metrics using (1) four of the most common medical ontologies with respect to a corpus of medical documents and (2) seven of the most popular English thesauri with respect to three corpora that sample language from medicine, news, and novels. Here we show that our approach captures the quality of ontological representation and guides efforts to narrow the breach between ontology and collective discourse within a domain. Our results also demonstrate key features of medical ontologies, English thesauri, and discourse from different domains. Medical ontologies have a small intersection, as do English thesauri. Moreover, dialects characteristic of distinct domains vary strikingly as many of the same words are used quite differently in medicine, news, and novels. As ontologies are intended to mirror the state of knowledge, our methods to tighten the fit between ontology and domain will increase their relevance for new areas of biomedical science and improve the accuracy and power of inferences computed across them.

Files

journal.pcbi.1001055.pdf

Files (7.8 MB)

Name Size Download all
Article
md5:e306e5f8ff1e9d0adf55772e7c1712dc
1.4 MB Preview Download
md5:eeed455fa528be37314f581fb759b84e
6.4 MB Preview Download

Additional details

Identifiers

DOI
10.1371/journal.pcbi.1001055
Other
oai:uchicago.tind.io:10226

Funding

National Institutes of Health
R01GM061372
National Institutes of Health
R01LM010132
National Institutes of Health
U54 CA121852-01A1

UChicago Information

Division(s)
Biological Sciences Division, Social Sciences Division
Department(s)
Human Genetics, Medicine, Sociology
Center(s) or Institute(s)
Institute for Genomics and Systems Biology