Acquiring topic features to improve event extraction: In pre-selected and balanced collections

Shasha Liao, Ralph Grishman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Event extraction is a particularly challenging type of information extraction (IE) that may require inferences from the whole article. However, most current event extraction systems rely on local information at the phrase or sentence level, and do not consider the article as a whole, thus limiting extraction performance. Moreover, most annotated corpora are artificially enriched to include enough positive samples of the events of interest; event identification on a more balanced collection, such as unfiltered newswire, may perform much worse. In this paper, we investigate the use of unsupervised topic models to extract topic features to improve event extraction both on test data similar to training data, and on more balanced collections. We compare this unsupervised approach to a supervised multi-label text classifier, and show that unsupervised topic modeling can get better results for both collections, and especially for a more balanced collection. We show that the unsupervised topic model can improve trigger, argument and role labeling by 3.5%, 6.9% and 6% respectively on a pre-selected corpus, and by 16.8%, 12.5% and 12.7% on a balanced corpus.

Original languageEnglish (US)
Title of host publicationInternational Conference Recent Advances in Natural Language Processing, RANLP
Pages9-16
Number of pages8
StatePublished - 2011
Event8th International Conference on Recent Advances in Natural Language Processing, RANLP 2011 - Hissar, Bulgaria
Duration: Sep 12 2011Sep 14 2011

Other

Other8th International Conference on Recent Advances in Natural Language Processing, RANLP 2011
CountryBulgaria
CityHissar
Period9/12/119/14/11

Fingerprint

Labeling
Labels
Classifiers

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Software
  • Electrical and Electronic Engineering

Cite this

Liao, S., & Grishman, R. (2011). Acquiring topic features to improve event extraction: In pre-selected and balanced collections. In International Conference Recent Advances in Natural Language Processing, RANLP (pp. 9-16)

Acquiring topic features to improve event extraction : In pre-selected and balanced collections. / Liao, Shasha; Grishman, Ralph.

International Conference Recent Advances in Natural Language Processing, RANLP. 2011. p. 9-16.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Liao, S & Grishman, R 2011, Acquiring topic features to improve event extraction: In pre-selected and balanced collections. in International Conference Recent Advances in Natural Language Processing, RANLP. pp. 9-16, 8th International Conference on Recent Advances in Natural Language Processing, RANLP 2011, Hissar, Bulgaria, 9/12/11.
Liao S, Grishman R. Acquiring topic features to improve event extraction: In pre-selected and balanced collections. In International Conference Recent Advances in Natural Language Processing, RANLP. 2011. p. 9-16
Liao, Shasha ; Grishman, Ralph. / Acquiring topic features to improve event extraction : In pre-selected and balanced collections. International Conference Recent Advances in Natural Language Processing, RANLP. 2011. pp. 9-16
@inproceedings{683604de29da4197a0ec1e478f145e19,
title = "Acquiring topic features to improve event extraction: In pre-selected and balanced collections",
abstract = "Event extraction is a particularly challenging type of information extraction (IE) that may require inferences from the whole article. However, most current event extraction systems rely on local information at the phrase or sentence level, and do not consider the article as a whole, thus limiting extraction performance. Moreover, most annotated corpora are artificially enriched to include enough positive samples of the events of interest; event identification on a more balanced collection, such as unfiltered newswire, may perform much worse. In this paper, we investigate the use of unsupervised topic models to extract topic features to improve event extraction both on test data similar to training data, and on more balanced collections. We compare this unsupervised approach to a supervised multi-label text classifier, and show that unsupervised topic modeling can get better results for both collections, and especially for a more balanced collection. We show that the unsupervised topic model can improve trigger, argument and role labeling by 3.5{\%}, 6.9{\%} and 6{\%} respectively on a pre-selected corpus, and by 16.8{\%}, 12.5{\%} and 12.7{\%} on a balanced corpus.",
author = "Shasha Liao and Ralph Grishman",
year = "2011",
language = "English (US)",
pages = "9--16",
booktitle = "International Conference Recent Advances in Natural Language Processing, RANLP",

}

TY - GEN

T1 - Acquiring topic features to improve event extraction

T2 - In pre-selected and balanced collections

AU - Liao, Shasha

AU - Grishman, Ralph

PY - 2011

Y1 - 2011

N2 - Event extraction is a particularly challenging type of information extraction (IE) that may require inferences from the whole article. However, most current event extraction systems rely on local information at the phrase or sentence level, and do not consider the article as a whole, thus limiting extraction performance. Moreover, most annotated corpora are artificially enriched to include enough positive samples of the events of interest; event identification on a more balanced collection, such as unfiltered newswire, may perform much worse. In this paper, we investigate the use of unsupervised topic models to extract topic features to improve event extraction both on test data similar to training data, and on more balanced collections. We compare this unsupervised approach to a supervised multi-label text classifier, and show that unsupervised topic modeling can get better results for both collections, and especially for a more balanced collection. We show that the unsupervised topic model can improve trigger, argument and role labeling by 3.5%, 6.9% and 6% respectively on a pre-selected corpus, and by 16.8%, 12.5% and 12.7% on a balanced corpus.

AB - Event extraction is a particularly challenging type of information extraction (IE) that may require inferences from the whole article. However, most current event extraction systems rely on local information at the phrase or sentence level, and do not consider the article as a whole, thus limiting extraction performance. Moreover, most annotated corpora are artificially enriched to include enough positive samples of the events of interest; event identification on a more balanced collection, such as unfiltered newswire, may perform much worse. In this paper, we investigate the use of unsupervised topic models to extract topic features to improve event extraction both on test data similar to training data, and on more balanced collections. We compare this unsupervised approach to a supervised multi-label text classifier, and show that unsupervised topic modeling can get better results for both collections, and especially for a more balanced collection. We show that the unsupervised topic model can improve trigger, argument and role labeling by 3.5%, 6.9% and 6% respectively on a pre-selected corpus, and by 16.8%, 12.5% and 12.7% on a balanced corpus.

UR - http://www.scopus.com/inward/record.url?scp=84866876887&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84866876887&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84866876887

SP - 9

EP - 16

BT - International Conference Recent Advances in Natural Language Processing, RANLP

ER -