Quantifying the Impact and Extent of Undocumented Biomedical Synonymy

Blair, David R.; Wang, Kanix; Nestorov, Svetlozar; Evans, James A.; Rzhetsky, Andrey

Quantifying the Impact and Extent of Undocumented Biomedical Synonymy

Blair, David R.; Wang, Kanix; Nestorov, Svetlozar; Evans, James A.; Rzhetsky, Andrey

2014

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Cite

Files

Abstract

Synonymous relationships among biomedical terms are extensively annotated within specialized terminologies, implying that synonymy is important for practical computational applications within this field. It remains unclear, however, whether text mining actually benefits from documented synonymy and whether existing biomedical thesauri provide adequate coverage of these linguistic relationships. In this study, we examine the impact and extent of undocumented synonymy within a very large compendium of biomedical thesauri. First, we demonstrate that missing synonymy has a significant negative impact on named entity normalization, an important problem within the field of biomedical text mining. To estimate the amount synonymy currently missing from thesauri, we develop a probabilistic model for the construction of synonym terminologies that is capable of handling a wide range of potential biases, and we evaluate its performance using the broader domain of near-synonymy among general English words. Our model predicts that over 90% of these relationships are currently undocumented, a result that we support experimentally through “crowd-sourcing.” Finally, we apply our model to biomedical terminologies and predict that they are missing the vast majority (>90%) of the synonymous relationships they intend to document. Overall, our results expose the dramatic incompleteness of current biomedical thesauri and suggest the need for “next-generation,” high-coverage lexical terminologies.

Details

Title

Quantifying the Impact and Extent of Undocumented Biomedical Synonymy

Author

Blair, David R. : University of Chicago
Wang, Kanix : University of Chicago
Nestorov, Svetlozar : University of Chicago
Evans, James A. : University of Chicago
Rzhetsky, Andrey : University of Chicago

Content Type

Article

Published in

PLOS Computational Biology

Identifier(s)

DOI: https://doi.org/10.1371/journal.pcbi.1003799

Funding Information

National Institutes of Health, 1P50MH094267
National Institutes of Health, U01HL108634-01
National Institutes of Health, GM007281

Publication Date

2014-09-25

Language

English

Copyright Statement

This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Licensing

CC BY

Record Appears in

Biological Sciences Division > Genetics, Genomics, and Systems Biology
Centers and Institutes > Institute for Genomics and Systems Biology
Social Sciences Division > Sociology
All

Record Created

2024-01-03

Preview

Statistics

Download Full History