Challenges in the knowledge base population slot filling task

Bonan Min, Ralph Grishman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The Knowledge Based Population (KBP) evaluation track of the Text Analysis Conferences (TAC) has been held for the past 3 years. One of the two tasks of KBP is slot filling: finding within a large corpus the values of a set of attributes of given people and organizations. This task has proven very challenging, with top systems rarely exceeding 30% F-measure. In this paper, we present an error analysis and classification for those answers which could be found by a manual corpus search but were not found by any of the systems participating in the 2010 evaluation. The most common sources of failure were limitations on inference, errors in coreference (particularly with nominal anaphors), and errors in named entity recognition. We relate the types of errors to the characteristics of the task and show the wide diversity of problems that must be addressed to improve overall performance.

Original languageEnglish (US)
Title of host publicationProceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012
PublisherEuropean Language Resources Association (ELRA)
Pages1137-1142
Number of pages6
ISBN (Electronic)9782951740877
StatePublished - Jan 1 2012
Event8th International Conference on Language Resources and Evaluation, LREC 2012 - Istanbul, Turkey
Duration: May 21 2012May 27 2012

Other

Other8th International Conference on Language Resources and Evaluation, LREC 2012
CountryTurkey
CityIstanbul
Period5/21/125/27/12

Fingerprint

text analysis
evaluation
knowledge
performance
Values
Evaluation
Text Analysis
Coreference
Entity
Error Analysis
Inference

Keywords

  • Analysis
  • Evaluation
  • Information Extraction

ASJC Scopus subject areas

  • Linguistics and Language
  • Language and Linguistics
  • Education
  • Library and Information Sciences

Cite this

Min, B., & Grishman, R. (2012). Challenges in the knowledge base population slot filling task. In Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012 (pp. 1137-1142). European Language Resources Association (ELRA).

Challenges in the knowledge base population slot filling task. / Min, Bonan; Grishman, Ralph.

Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012. European Language Resources Association (ELRA), 2012. p. 1137-1142.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Min, B & Grishman, R 2012, Challenges in the knowledge base population slot filling task. in Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012. European Language Resources Association (ELRA), pp. 1137-1142, 8th International Conference on Language Resources and Evaluation, LREC 2012, Istanbul, Turkey, 5/21/12.
Min B, Grishman R. Challenges in the knowledge base population slot filling task. In Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012. European Language Resources Association (ELRA). 2012. p. 1137-1142
Min, Bonan ; Grishman, Ralph. / Challenges in the knowledge base population slot filling task. Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012. European Language Resources Association (ELRA), 2012. pp. 1137-1142
@inproceedings{fa9cc539ff6c4e26865cfeec587b04f4,
title = "Challenges in the knowledge base population slot filling task",
abstract = "The Knowledge Based Population (KBP) evaluation track of the Text Analysis Conferences (TAC) has been held for the past 3 years. One of the two tasks of KBP is slot filling: finding within a large corpus the values of a set of attributes of given people and organizations. This task has proven very challenging, with top systems rarely exceeding 30{\%} F-measure. In this paper, we present an error analysis and classification for those answers which could be found by a manual corpus search but were not found by any of the systems participating in the 2010 evaluation. The most common sources of failure were limitations on inference, errors in coreference (particularly with nominal anaphors), and errors in named entity recognition. We relate the types of errors to the characteristics of the task and show the wide diversity of problems that must be addressed to improve overall performance.",
keywords = "Analysis, Evaluation, Information Extraction",
author = "Bonan Min and Ralph Grishman",
year = "2012",
month = "1",
day = "1",
language = "English (US)",
pages = "1137--1142",
booktitle = "Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012",
publisher = "European Language Resources Association (ELRA)",

}

TY - GEN

T1 - Challenges in the knowledge base population slot filling task

AU - Min, Bonan

AU - Grishman, Ralph

PY - 2012/1/1

Y1 - 2012/1/1

N2 - The Knowledge Based Population (KBP) evaluation track of the Text Analysis Conferences (TAC) has been held for the past 3 years. One of the two tasks of KBP is slot filling: finding within a large corpus the values of a set of attributes of given people and organizations. This task has proven very challenging, with top systems rarely exceeding 30% F-measure. In this paper, we present an error analysis and classification for those answers which could be found by a manual corpus search but were not found by any of the systems participating in the 2010 evaluation. The most common sources of failure were limitations on inference, errors in coreference (particularly with nominal anaphors), and errors in named entity recognition. We relate the types of errors to the characteristics of the task and show the wide diversity of problems that must be addressed to improve overall performance.

AB - The Knowledge Based Population (KBP) evaluation track of the Text Analysis Conferences (TAC) has been held for the past 3 years. One of the two tasks of KBP is slot filling: finding within a large corpus the values of a set of attributes of given people and organizations. This task has proven very challenging, with top systems rarely exceeding 30% F-measure. In this paper, we present an error analysis and classification for those answers which could be found by a manual corpus search but were not found by any of the systems participating in the 2010 evaluation. The most common sources of failure were limitations on inference, errors in coreference (particularly with nominal anaphors), and errors in named entity recognition. We relate the types of errors to the characteristics of the task and show the wide diversity of problems that must be addressed to improve overall performance.

KW - Analysis

KW - Evaluation

KW - Information Extraction

UR - http://www.scopus.com/inward/record.url?scp=84974791397&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84974791397&partnerID=8YFLogxK

M3 - Conference contribution

SP - 1137

EP - 1142

BT - Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012

PB - European Language Resources Association (ELRA)

ER -