Distant supervision for relation extraction with an incomplete knowledge base

Bonan Min, Ralph Grishman, Li Wan, Chang Wang, David Gondek

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Distant supervision, heuristically labeling a corpus using a knowledge base, has emerged as a popular choice for training relation extractors. In this paper, we show that a significant number of "negative" examples generated by the labeling process are false negatives because the knowledge base is incomplete. Therefore the heuristic for generating negative examples has a serious flaw. Building on a state-of-The-Art distantly-supervised extraction algorithm, we proposed an algorithm that learns from only positive and unlabeled labels at the pair-of-entity level. Experimental results demonstrate its advantage over existing algorithms.

Original languageEnglish (US)
Title of host publicationNAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages777-782
Number of pages6
ISBN (Print)9781937284473
StatePublished - 2013
Event2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2013 - Atlanta, United States
Duration: Jun 9 2013Jun 14 2013

Other

Other2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2013
CountryUnited States
CityAtlanta
Period6/9/136/14/13

Fingerprint

supervision
Labeling
knowledge
Labels
heuristics
Defects
Incomplete
Supervision
Entity
Heuristics

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Science Applications
  • Linguistics and Language

Cite this

Min, B., Grishman, R., Wan, L., Wang, C., & Gondek, D. (2013). Distant supervision for relation extraction with an incomplete knowledge base. In NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference (pp. 777-782). Association for Computational Linguistics (ACL).

Distant supervision for relation extraction with an incomplete knowledge base. / Min, Bonan; Grishman, Ralph; Wan, Li; Wang, Chang; Gondek, David.

NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference. Association for Computational Linguistics (ACL), 2013. p. 777-782.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Min, B, Grishman, R, Wan, L, Wang, C & Gondek, D 2013, Distant supervision for relation extraction with an incomplete knowledge base. in NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference. Association for Computational Linguistics (ACL), pp. 777-782, 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2013, Atlanta, United States, 6/9/13.
Min B, Grishman R, Wan L, Wang C, Gondek D. Distant supervision for relation extraction with an incomplete knowledge base. In NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference. Association for Computational Linguistics (ACL). 2013. p. 777-782
Min, Bonan ; Grishman, Ralph ; Wan, Li ; Wang, Chang ; Gondek, David. / Distant supervision for relation extraction with an incomplete knowledge base. NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference. Association for Computational Linguistics (ACL), 2013. pp. 777-782
@inproceedings{afa19f3e18b5405db9576612e111f94e,
title = "Distant supervision for relation extraction with an incomplete knowledge base",
abstract = "Distant supervision, heuristically labeling a corpus using a knowledge base, has emerged as a popular choice for training relation extractors. In this paper, we show that a significant number of {"}negative{"} examples generated by the labeling process are false negatives because the knowledge base is incomplete. Therefore the heuristic for generating negative examples has a serious flaw. Building on a state-of-The-Art distantly-supervised extraction algorithm, we proposed an algorithm that learns from only positive and unlabeled labels at the pair-of-entity level. Experimental results demonstrate its advantage over existing algorithms.",
author = "Bonan Min and Ralph Grishman and Li Wan and Chang Wang and David Gondek",
year = "2013",
language = "English (US)",
isbn = "9781937284473",
pages = "777--782",
booktitle = "NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference",
publisher = "Association for Computational Linguistics (ACL)",

}

TY - GEN

T1 - Distant supervision for relation extraction with an incomplete knowledge base

AU - Min, Bonan

AU - Grishman, Ralph

AU - Wan, Li

AU - Wang, Chang

AU - Gondek, David

PY - 2013

Y1 - 2013

N2 - Distant supervision, heuristically labeling a corpus using a knowledge base, has emerged as a popular choice for training relation extractors. In this paper, we show that a significant number of "negative" examples generated by the labeling process are false negatives because the knowledge base is incomplete. Therefore the heuristic for generating negative examples has a serious flaw. Building on a state-of-The-Art distantly-supervised extraction algorithm, we proposed an algorithm that learns from only positive and unlabeled labels at the pair-of-entity level. Experimental results demonstrate its advantage over existing algorithms.

AB - Distant supervision, heuristically labeling a corpus using a knowledge base, has emerged as a popular choice for training relation extractors. In this paper, we show that a significant number of "negative" examples generated by the labeling process are false negatives because the knowledge base is incomplete. Therefore the heuristic for generating negative examples has a serious flaw. Building on a state-of-The-Art distantly-supervised extraction algorithm, we proposed an algorithm that learns from only positive and unlabeled labels at the pair-of-entity level. Experimental results demonstrate its advantage over existing algorithms.

UR - http://www.scopus.com/inward/record.url?scp=84926224568&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84926224568&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84926224568

SN - 9781937284473

SP - 777

EP - 782

BT - NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference

PB - Association for Computational Linguistics (ACL)

ER -