Ensemble semantics for large-scale unsupervised relation extraction

Bonan Min, Shuming Shi, Ralph Grishman, Chin Yew Lin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Discovering significant types of relations from the web is challenging because of its open nature. Unsupervised algorithms are developed to extract relations from a corpus without knowing the relations in advance, but most of them rely on tagging arguments of predefined types. Recently, a new algorithm was proposed to jointly extract relations and their argument semantic classes, taking a set of relation instances extracted by an open IE algorithm as input. However, it cannot handle polysemy of relation phrases and fails to group many similar ("synonymous") relation instances because of the sparseness of features. In this paper, we present a novel unsupervised algorithm that provides a more general treatment of the polysemy and synonymy problems. The algorithm incorporates various knowledge sources which we will show to be very effective for unsupervised extraction. Moreover, it explicitly disambiguates polysemous relation phrases and groups synonymous ones. While maintaining approximately the same precision, the algorithm achieves significant improvement on recall compared to the previous method. It is also very efficient. Experiments on a real-world dataset show that it can handle 14.7 million relation instances and extract a very large set of relations from the web.

Original languageEnglish (US)
Title of host publicationEMNLP-CoNLL 2012 - 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the Conference
Pages1027-1037
Number of pages11
StatePublished - 2012
Event2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012 - Jeju Island, Korea, Republic of
Duration: Jul 12 2012Jul 14 2012

Other

Other2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012
CountryKorea, Republic of
CityJeju Island
Period7/12/127/14/12

    Fingerprint

ASJC Scopus subject areas

  • Software

Cite this

Min, B., Shi, S., Grishman, R., & Lin, C. Y. (2012). Ensemble semantics for large-scale unsupervised relation extraction. In EMNLP-CoNLL 2012 - 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the Conference (pp. 1027-1037)