Probabilistic Belief Embedding for Large-Scale Knowledge Population

Miao Fan, Qiang Zhou, Andrew Abel, Thomas Fang Zheng, Ralph Grishman

Research output: Contribution to journalArticle

Abstract

Background: To populate knowledge repositories, such as WordNet, Freebase and NELL, two branches of research have grown separately for decades. On the one hand, corpus-based methods which leverage unstructured free texts have been explored for years; on the other hand, some recently emerged embedding-based approaches use structured knowledge graphs to learn distributed representations of entities and relations. But there are still few comprehensive and elegant models that can integrate those large-scale heterogeneous resources to satisfy multiple subtasks of knowledge population including entity inference, relation prediction and triplet classification. Methods: This paper contributes a novel embedding model which estimates the probability of each candidate belief in a large-scale knowledge repository via simultaneously learning distributed representations for entities (h and t), relations (r) and the words in relation mentions (m). It facilitates knowledge population by means of simple vector operations to discover new beliefs. Given an imperfect belief, we can not only infer the missing entities and predict the unknown relations, but also identify the plausibility of the belief, just by leveraging the learned embeddings of remaining evidence. Results: To demonstrate the scalability and the effectiveness of our model, experiments have been conducted on several large-scale repositories which contain millions of beliefs from WordNet, Freebase and NELL, and the results are compared with other cutting-edge approaches via comparing the performance assessed by the tasks of entity inference, relation prediction and triplet classification with their respective metrics. Extensive experimental results show that the proposed model outperforms the state of the arts with significant improvements. Conclusions: The essence of the improvements comes from the capability of our model that encodes not only structured knowledge graph information, but also unstructured relation mentions, into continuous vector spaces, so that we can bridge the gap of one-hot representations, and expect to discover certain relevance among entities, relations and even words in relation mentions.

Original languageEnglish (US)
Pages (from-to)1-16
Number of pages16
JournalCognitive Computation
DOIs
StateAccepted/In press - Aug 8 2016

Fingerprint

Population
Task Performance and Analysis
Vector spaces
Scalability
Learning
Research
Experiments

Keywords

  • Belief embedding
  • Entity inference
  • Knowledge population
  • Relation prediction
  • Triplet classification

ASJC Scopus subject areas

  • Cognitive Neuroscience
  • Computer Science Applications
  • Computer Vision and Pattern Recognition

Cite this

Probabilistic Belief Embedding for Large-Scale Knowledge Population. / Fan, Miao; Zhou, Qiang; Abel, Andrew; Zheng, Thomas Fang; Grishman, Ralph.

In: Cognitive Computation, 08.08.2016, p. 1-16.

Research output: Contribution to journalArticle

Fan, Miao ; Zhou, Qiang ; Abel, Andrew ; Zheng, Thomas Fang ; Grishman, Ralph. / Probabilistic Belief Embedding for Large-Scale Knowledge Population. In: Cognitive Computation. 2016 ; pp. 1-16.
@article{05cf2ca96be64b20ae6a92dc788145c6,
title = "Probabilistic Belief Embedding for Large-Scale Knowledge Population",
abstract = "Background: To populate knowledge repositories, such as WordNet, Freebase and NELL, two branches of research have grown separately for decades. On the one hand, corpus-based methods which leverage unstructured free texts have been explored for years; on the other hand, some recently emerged embedding-based approaches use structured knowledge graphs to learn distributed representations of entities and relations. But there are still few comprehensive and elegant models that can integrate those large-scale heterogeneous resources to satisfy multiple subtasks of knowledge population including entity inference, relation prediction and triplet classification. Methods: This paper contributes a novel embedding model which estimates the probability of each candidate belief in a large-scale knowledge repository via simultaneously learning distributed representations for entities (h and t), relations (r) and the words in relation mentions (m). It facilitates knowledge population by means of simple vector operations to discover new beliefs. Given an imperfect belief, we can not only infer the missing entities and predict the unknown relations, but also identify the plausibility of the belief, just by leveraging the learned embeddings of remaining evidence. Results: To demonstrate the scalability and the effectiveness of our model, experiments have been conducted on several large-scale repositories which contain millions of beliefs from WordNet, Freebase and NELL, and the results are compared with other cutting-edge approaches via comparing the performance assessed by the tasks of entity inference, relation prediction and triplet classification with their respective metrics. Extensive experimental results show that the proposed model outperforms the state of the arts with significant improvements. Conclusions: The essence of the improvements comes from the capability of our model that encodes not only structured knowledge graph information, but also unstructured relation mentions, into continuous vector spaces, so that we can bridge the gap of one-hot representations, and expect to discover certain relevance among entities, relations and even words in relation mentions.",
keywords = "Belief embedding, Entity inference, Knowledge population, Relation prediction, Triplet classification",
author = "Miao Fan and Qiang Zhou and Andrew Abel and Zheng, {Thomas Fang} and Ralph Grishman",
year = "2016",
month = "8",
day = "8",
doi = "10.1007/s12559-016-9425-5",
language = "English (US)",
pages = "1--16",
journal = "Cognitive Computation",
issn = "1866-9956",
publisher = "Springer New York",

}

TY - JOUR

T1 - Probabilistic Belief Embedding for Large-Scale Knowledge Population

AU - Fan, Miao

AU - Zhou, Qiang

AU - Abel, Andrew

AU - Zheng, Thomas Fang

AU - Grishman, Ralph

PY - 2016/8/8

Y1 - 2016/8/8

N2 - Background: To populate knowledge repositories, such as WordNet, Freebase and NELL, two branches of research have grown separately for decades. On the one hand, corpus-based methods which leverage unstructured free texts have been explored for years; on the other hand, some recently emerged embedding-based approaches use structured knowledge graphs to learn distributed representations of entities and relations. But there are still few comprehensive and elegant models that can integrate those large-scale heterogeneous resources to satisfy multiple subtasks of knowledge population including entity inference, relation prediction and triplet classification. Methods: This paper contributes a novel embedding model which estimates the probability of each candidate belief in a large-scale knowledge repository via simultaneously learning distributed representations for entities (h and t), relations (r) and the words in relation mentions (m). It facilitates knowledge population by means of simple vector operations to discover new beliefs. Given an imperfect belief, we can not only infer the missing entities and predict the unknown relations, but also identify the plausibility of the belief, just by leveraging the learned embeddings of remaining evidence. Results: To demonstrate the scalability and the effectiveness of our model, experiments have been conducted on several large-scale repositories which contain millions of beliefs from WordNet, Freebase and NELL, and the results are compared with other cutting-edge approaches via comparing the performance assessed by the tasks of entity inference, relation prediction and triplet classification with their respective metrics. Extensive experimental results show that the proposed model outperforms the state of the arts with significant improvements. Conclusions: The essence of the improvements comes from the capability of our model that encodes not only structured knowledge graph information, but also unstructured relation mentions, into continuous vector spaces, so that we can bridge the gap of one-hot representations, and expect to discover certain relevance among entities, relations and even words in relation mentions.

AB - Background: To populate knowledge repositories, such as WordNet, Freebase and NELL, two branches of research have grown separately for decades. On the one hand, corpus-based methods which leverage unstructured free texts have been explored for years; on the other hand, some recently emerged embedding-based approaches use structured knowledge graphs to learn distributed representations of entities and relations. But there are still few comprehensive and elegant models that can integrate those large-scale heterogeneous resources to satisfy multiple subtasks of knowledge population including entity inference, relation prediction and triplet classification. Methods: This paper contributes a novel embedding model which estimates the probability of each candidate belief in a large-scale knowledge repository via simultaneously learning distributed representations for entities (h and t), relations (r) and the words in relation mentions (m). It facilitates knowledge population by means of simple vector operations to discover new beliefs. Given an imperfect belief, we can not only infer the missing entities and predict the unknown relations, but also identify the plausibility of the belief, just by leveraging the learned embeddings of remaining evidence. Results: To demonstrate the scalability and the effectiveness of our model, experiments have been conducted on several large-scale repositories which contain millions of beliefs from WordNet, Freebase and NELL, and the results are compared with other cutting-edge approaches via comparing the performance assessed by the tasks of entity inference, relation prediction and triplet classification with their respective metrics. Extensive experimental results show that the proposed model outperforms the state of the arts with significant improvements. Conclusions: The essence of the improvements comes from the capability of our model that encodes not only structured knowledge graph information, but also unstructured relation mentions, into continuous vector spaces, so that we can bridge the gap of one-hot representations, and expect to discover certain relevance among entities, relations and even words in relation mentions.

KW - Belief embedding

KW - Entity inference

KW - Knowledge population

KW - Relation prediction

KW - Triplet classification

UR - http://www.scopus.com/inward/record.url?scp=84981163507&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84981163507&partnerID=8YFLogxK

U2 - 10.1007/s12559-016-9425-5

DO - 10.1007/s12559-016-9425-5

M3 - Article

SP - 1

EP - 16

JO - Cognitive Computation

JF - Cognitive Computation

SN - 1866-9956

ER -