Compensating for annotation errors in training a relation extractor

Bonan Min, Ralph Grishman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The well-studied supervised Relation Extraction algorithms require training data that is accurate and has good coverage. To obtain such a gold standard, the common practice is to do independent double annotation followed by adjudication. This takes significantly more human effort than annotation done by a single annotator. We do a detailed analysis on a snapshot of the ACE 2005 annotation files to understand the differences between single-pass annotation and the more expensive nearly three-pass process, and then propose an algorithm that learns from the much cheaper single-pass annotation and achieves a performance on a par with the extractor trained on multi-pass annotated data. Furthermore, we show that given the same amount of human labor, the better way to do relation annotation is not to annotate with high-cost quality assurance, but to annotate more.

Original languageEnglish (US)
Title of host publicationEACL 2012 - 13th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings
PublisherAssociation for Computational Linguistics (ACL)
Pages194-203
Number of pages10
ISBN (Electronic)9781937284190
StatePublished - Jan 1 2012
Event13th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2012 - Avignon, France
Duration: Apr 23 2012Apr 27 2012

Other

Other13th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2012
CountryFrance
CityAvignon
Period4/23/124/27/12

Fingerprint

Quality assurance
Personnel
Costs

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Software

Cite this

Min, B., & Grishman, R. (2012). Compensating for annotation errors in training a relation extractor. In EACL 2012 - 13th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings (pp. 194-203). Association for Computational Linguistics (ACL).

Compensating for annotation errors in training a relation extractor. / Min, Bonan; Grishman, Ralph.

EACL 2012 - 13th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings. Association for Computational Linguistics (ACL), 2012. p. 194-203.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Min, B & Grishman, R 2012, Compensating for annotation errors in training a relation extractor. in EACL 2012 - 13th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings. Association for Computational Linguistics (ACL), pp. 194-203, 13th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2012, Avignon, France, 4/23/12.
Min B, Grishman R. Compensating for annotation errors in training a relation extractor. In EACL 2012 - 13th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings. Association for Computational Linguistics (ACL). 2012. p. 194-203
Min, Bonan ; Grishman, Ralph. / Compensating for annotation errors in training a relation extractor. EACL 2012 - 13th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings. Association for Computational Linguistics (ACL), 2012. pp. 194-203
@inproceedings{a98aa7497e834c33b86afa4d4f43dd0d,
title = "Compensating for annotation errors in training a relation extractor",
abstract = "The well-studied supervised Relation Extraction algorithms require training data that is accurate and has good coverage. To obtain such a gold standard, the common practice is to do independent double annotation followed by adjudication. This takes significantly more human effort than annotation done by a single annotator. We do a detailed analysis on a snapshot of the ACE 2005 annotation files to understand the differences between single-pass annotation and the more expensive nearly three-pass process, and then propose an algorithm that learns from the much cheaper single-pass annotation and achieves a performance on a par with the extractor trained on multi-pass annotated data. Furthermore, we show that given the same amount of human labor, the better way to do relation annotation is not to annotate with high-cost quality assurance, but to annotate more.",
author = "Bonan Min and Ralph Grishman",
year = "2012",
month = "1",
day = "1",
language = "English (US)",
pages = "194--203",
booktitle = "EACL 2012 - 13th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings",
publisher = "Association for Computational Linguistics (ACL)",

}

TY - GEN

T1 - Compensating for annotation errors in training a relation extractor

AU - Min, Bonan

AU - Grishman, Ralph

PY - 2012/1/1

Y1 - 2012/1/1

N2 - The well-studied supervised Relation Extraction algorithms require training data that is accurate and has good coverage. To obtain such a gold standard, the common practice is to do independent double annotation followed by adjudication. This takes significantly more human effort than annotation done by a single annotator. We do a detailed analysis on a snapshot of the ACE 2005 annotation files to understand the differences between single-pass annotation and the more expensive nearly three-pass process, and then propose an algorithm that learns from the much cheaper single-pass annotation and achieves a performance on a par with the extractor trained on multi-pass annotated data. Furthermore, we show that given the same amount of human labor, the better way to do relation annotation is not to annotate with high-cost quality assurance, but to annotate more.

AB - The well-studied supervised Relation Extraction algorithms require training data that is accurate and has good coverage. To obtain such a gold standard, the common practice is to do independent double annotation followed by adjudication. This takes significantly more human effort than annotation done by a single annotator. We do a detailed analysis on a snapshot of the ACE 2005 annotation files to understand the differences between single-pass annotation and the more expensive nearly three-pass process, and then propose an algorithm that learns from the much cheaper single-pass annotation and achieves a performance on a par with the extractor trained on multi-pass annotated data. Furthermore, we show that given the same amount of human labor, the better way to do relation annotation is not to annotate with high-cost quality assurance, but to annotate more.

UR - http://www.scopus.com/inward/record.url?scp=85035354202&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85035354202&partnerID=8YFLogxK

M3 - Conference contribution

SP - 194

EP - 203

BT - EACL 2012 - 13th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings

PB - Association for Computational Linguistics (ACL)

ER -