How to Get the Most out of Your Curation Effort
- 1. University of Chicago
- 2. Queen's University
- 3. National Institutes of Health
Description
Large-scale annotation efforts typically involve several experts who may disagree with each other. We propose an approach for modeling disagreements among experts that allows providing each annotation with a confidence value (i.e., the posterior probability that it is correct). Our approach allows computing certainty-level for individual annotations, given annotator-specific parameters estimated from data. We developed two probabilistic models for performing this analysis, compared these models using computer simulation, and tested each model's actual performance, based on a large data set generated by human annotators specifically for this study. We show that even in the worst-case scenario, when all annotators disagree, our approach allows us to significantly increase the probability of choosing the correct annotation. Along with this publication we make publicly available a corpus of 10,000 sentences annotated according to several cardinal dimensions that we have introduced in earlier work. The 10,000 sentences were all 3-fold annotated by a group of eight experts, while a 1,000-sentence subset was further 5-fold annotated by five new experts. While the presented data represent a specialized curation task, our modeling approach is general; most data annotation studies could benefit from our methodology.
Files
journal.pcbi.1000391.pdf
Files
(5.4 MB)
| Name | Size | Download all |
|---|---|---|
|
Article md5:348657695066bf5117f1c19195360d94 |
1.1 MB | Preview Download |
|
md5:2704e37dc6bcff4b218006a96e27e6e2
|
4.3 MB | Preview Download |
Additional details
Identifiers
- DOI
- 10.1371/journal.pcbi.1000391
- Other
- oai:uchicago.tind.io:10217
Funding
- National Institutes of Health
- GM61372
- National Science Foundation
- 0438291
- National Science Foundation
- 0121687
- Cure Autism Now Foundation
- National Institutes of Health
- Intramural Research Program
- National Library of Medicine