Towards large-scale unsupervised relation extraction from the Web

Bonan Min, Shuming Shi, Ralph Grishman, Chin Yew Lin

Research output: Contribution to journalArticle

Abstract

The Web brings an open-ended set of semantic relations. Discovering the significant types is very challenging. Unsupervised algorithms have been developed to extract relations from a corpus without knowing the relation types in advance, but most rely on tagging arguments of predefined types. One recently reported system is able to jointly extract relations and their argument semantic classes, taking a set of relation instances extracted by an open IE (Information Extraction) algorithm as input. However, it cannot handle polysemy of relation phrases and fails to group many similar ("synonymous") relation instances because of the sparseness of features. In this paper, the authors present a novel unsupervised algorithm that provides a more general treatment of the polysemy and synonymy problems. The algorithm incorporates various knowledge sources which they will show to be very effective for unsupervised relation extraction. Moreover, it explicitly disambiguates polysemous relation phrases and groups synonymous ones. While maintaining approximately the same precision, the algorithm achieves significant improvement on recall compared to the previous method. It is also very efficient. Experiments on a real-world dataset show that it can handle 14.7 million relation instances and extract a very large set of relations from the Web.

Original languageEnglish (US)
Pages (from-to)1-23
Number of pages23
JournalInternational Journal on Semantic Web and Information Systems
Volume8
Issue number3
DOIs
StatePublished - Jul 2012

Fingerprint

Semantics
Experiments

Keywords

  • Information extraction
  • Large-scale
  • Relation extraction
  • Semantics
  • Unsupervised learning
  • Web

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems

Cite this

Towards large-scale unsupervised relation extraction from the Web. / Min, Bonan; Shi, Shuming; Grishman, Ralph; Lin, Chin Yew.

In: International Journal on Semantic Web and Information Systems, Vol. 8, No. 3, 07.2012, p. 1-23.

Research output: Contribution to journalArticle

@article{fb412d47e74f436297a83a2f9e4c7d02,
title = "Towards large-scale unsupervised relation extraction from the Web",
abstract = "The Web brings an open-ended set of semantic relations. Discovering the significant types is very challenging. Unsupervised algorithms have been developed to extract relations from a corpus without knowing the relation types in advance, but most rely on tagging arguments of predefined types. One recently reported system is able to jointly extract relations and their argument semantic classes, taking a set of relation instances extracted by an open IE (Information Extraction) algorithm as input. However, it cannot handle polysemy of relation phrases and fails to group many similar ({"}synonymous{"}) relation instances because of the sparseness of features. In this paper, the authors present a novel unsupervised algorithm that provides a more general treatment of the polysemy and synonymy problems. The algorithm incorporates various knowledge sources which they will show to be very effective for unsupervised relation extraction. Moreover, it explicitly disambiguates polysemous relation phrases and groups synonymous ones. While maintaining approximately the same precision, the algorithm achieves significant improvement on recall compared to the previous method. It is also very efficient. Experiments on a real-world dataset show that it can handle 14.7 million relation instances and extract a very large set of relations from the Web.",
keywords = "Information extraction, Large-scale, Relation extraction, Semantics, Unsupervised learning, Web",
author = "Bonan Min and Shuming Shi and Ralph Grishman and Lin, {Chin Yew}",
year = "2012",
month = "7",
doi = "10.4018/jswis.2012070101",
language = "English (US)",
volume = "8",
pages = "1--23",
journal = "Semantic Web and Information Systems",
issn = "1552-6283",
publisher = "IGI Publishing",
number = "3",

}

TY - JOUR

T1 - Towards large-scale unsupervised relation extraction from the Web

AU - Min, Bonan

AU - Shi, Shuming

AU - Grishman, Ralph

AU - Lin, Chin Yew

PY - 2012/7

Y1 - 2012/7

N2 - The Web brings an open-ended set of semantic relations. Discovering the significant types is very challenging. Unsupervised algorithms have been developed to extract relations from a corpus without knowing the relation types in advance, but most rely on tagging arguments of predefined types. One recently reported system is able to jointly extract relations and their argument semantic classes, taking a set of relation instances extracted by an open IE (Information Extraction) algorithm as input. However, it cannot handle polysemy of relation phrases and fails to group many similar ("synonymous") relation instances because of the sparseness of features. In this paper, the authors present a novel unsupervised algorithm that provides a more general treatment of the polysemy and synonymy problems. The algorithm incorporates various knowledge sources which they will show to be very effective for unsupervised relation extraction. Moreover, it explicitly disambiguates polysemous relation phrases and groups synonymous ones. While maintaining approximately the same precision, the algorithm achieves significant improvement on recall compared to the previous method. It is also very efficient. Experiments on a real-world dataset show that it can handle 14.7 million relation instances and extract a very large set of relations from the Web.

AB - The Web brings an open-ended set of semantic relations. Discovering the significant types is very challenging. Unsupervised algorithms have been developed to extract relations from a corpus without knowing the relation types in advance, but most rely on tagging arguments of predefined types. One recently reported system is able to jointly extract relations and their argument semantic classes, taking a set of relation instances extracted by an open IE (Information Extraction) algorithm as input. However, it cannot handle polysemy of relation phrases and fails to group many similar ("synonymous") relation instances because of the sparseness of features. In this paper, the authors present a novel unsupervised algorithm that provides a more general treatment of the polysemy and synonymy problems. The algorithm incorporates various knowledge sources which they will show to be very effective for unsupervised relation extraction. Moreover, it explicitly disambiguates polysemous relation phrases and groups synonymous ones. While maintaining approximately the same precision, the algorithm achieves significant improvement on recall compared to the previous method. It is also very efficient. Experiments on a real-world dataset show that it can handle 14.7 million relation instances and extract a very large set of relations from the Web.

KW - Information extraction

KW - Large-scale

KW - Relation extraction

KW - Semantics

KW - Unsupervised learning

KW - Web

UR - http://www.scopus.com/inward/record.url?scp=84877857596&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84877857596&partnerID=8YFLogxK

U2 - 10.4018/jswis.2012070101

DO - 10.4018/jswis.2012070101

M3 - Article

VL - 8

SP - 1

EP - 23

JO - Semantic Web and Information Systems

JF - Semantic Web and Information Systems

SN - 1552-6283

IS - 3

ER -