Can document selection help semi-supervised learning? A case study on event extraction

Shasha Liao, Ralph Grishman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Annotating training data for event extraction is tedious and labor-intensive. Most current event extraction tasks rely on hundreds of annotated documents, but this is often not enough. In this paper, we present a novel self-training strategy, which uses Information Retrieval (IR) to collect a cluster of related documents as the resource for bootstrapping. Also, based on the particular characteristics of this corpus, global inference is applied to provide more confident and informative data selection. We compare this approach to self-training on a normal newswire corpus and show that IR can provide a better corpus for bootstrapping and that global inference can further improve instance selection. We obtain gains of 1.7% in trigger labeling and 2.3% in role labeling through IR and an additional 1.1% in trigger labeling and 1.3% in role labeling by applying global inference.

Original languageEnglish (US)
Title of host publicationACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies
Pages260-265
Number of pages6
Volume2
StatePublished - 2011
Event49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-HLT 2011 - Portland, OR, United States
Duration: Jun 19 2011Jun 24 2011

Other

Other49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-HLT 2011
CountryUnited States
CityPortland, OR
Period6/19/116/24/11

Fingerprint

information retrieval
event
learning
labor
resources
Labeling
Inference
Information Retrieval
Bootstrapping
Trigger

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Cite this

Liao, S., & Grishman, R. (2011). Can document selection help semi-supervised learning? A case study on event extraction. In ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (Vol. 2, pp. 260-265)

Can document selection help semi-supervised learning? A case study on event extraction. / Liao, Shasha; Grishman, Ralph.

ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Vol. 2 2011. p. 260-265.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Liao, S & Grishman, R 2011, Can document selection help semi-supervised learning? A case study on event extraction. in ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. vol. 2, pp. 260-265, 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-HLT 2011, Portland, OR, United States, 6/19/11.
Liao S, Grishman R. Can document selection help semi-supervised learning? A case study on event extraction. In ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Vol. 2. 2011. p. 260-265
Liao, Shasha ; Grishman, Ralph. / Can document selection help semi-supervised learning? A case study on event extraction. ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Vol. 2 2011. pp. 260-265
@inproceedings{60919d0df5aa47c6af48cd6f4c0cd60a,
title = "Can document selection help semi-supervised learning? A case study on event extraction",
abstract = "Annotating training data for event extraction is tedious and labor-intensive. Most current event extraction tasks rely on hundreds of annotated documents, but this is often not enough. In this paper, we present a novel self-training strategy, which uses Information Retrieval (IR) to collect a cluster of related documents as the resource for bootstrapping. Also, based on the particular characteristics of this corpus, global inference is applied to provide more confident and informative data selection. We compare this approach to self-training on a normal newswire corpus and show that IR can provide a better corpus for bootstrapping and that global inference can further improve instance selection. We obtain gains of 1.7{\%} in trigger labeling and 2.3{\%} in role labeling through IR and an additional 1.1{\%} in trigger labeling and 1.3{\%} in role labeling by applying global inference.",
author = "Shasha Liao and Ralph Grishman",
year = "2011",
language = "English (US)",
isbn = "9781932432886",
volume = "2",
pages = "260--265",
booktitle = "ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies",

}

TY - GEN

T1 - Can document selection help semi-supervised learning? A case study on event extraction

AU - Liao, Shasha

AU - Grishman, Ralph

PY - 2011

Y1 - 2011

N2 - Annotating training data for event extraction is tedious and labor-intensive. Most current event extraction tasks rely on hundreds of annotated documents, but this is often not enough. In this paper, we present a novel self-training strategy, which uses Information Retrieval (IR) to collect a cluster of related documents as the resource for bootstrapping. Also, based on the particular characteristics of this corpus, global inference is applied to provide more confident and informative data selection. We compare this approach to self-training on a normal newswire corpus and show that IR can provide a better corpus for bootstrapping and that global inference can further improve instance selection. We obtain gains of 1.7% in trigger labeling and 2.3% in role labeling through IR and an additional 1.1% in trigger labeling and 1.3% in role labeling by applying global inference.

AB - Annotating training data for event extraction is tedious and labor-intensive. Most current event extraction tasks rely on hundreds of annotated documents, but this is often not enough. In this paper, we present a novel self-training strategy, which uses Information Retrieval (IR) to collect a cluster of related documents as the resource for bootstrapping. Also, based on the particular characteristics of this corpus, global inference is applied to provide more confident and informative data selection. We compare this approach to self-training on a normal newswire corpus and show that IR can provide a better corpus for bootstrapping and that global inference can further improve instance selection. We obtain gains of 1.7% in trigger labeling and 2.3% in role labeling through IR and an additional 1.1% in trigger labeling and 1.3% in role labeling by applying global inference.

UR - http://www.scopus.com/inward/record.url?scp=84859062203&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84859062203&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84859062203

SN - 9781932432886

VL - 2

SP - 260

EP - 265

BT - ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

ER -