The representational geometry of word meanings acquired by neural machine translation models

Felix Hill, Kyunghyun Cho, Sébastien Jean, Yoshua Bengio

Research output: Contribution to journalArticle

Abstract

This work is the first comprehensive analysis of the properties of word embeddings learned by neural machine translation (NMT) models trained on bilingual texts. We show the word representations of NMT models outperform those learned from monolingual text by established algorithms such as Skipgram and CBOW on tasks that require knowledge of semantic similarity and/or lexical–syntactic role. These effects hold when translating from English to French and English to German, and we argue that the desirable properties of NMT word embeddings should emerge largely independently of the source and target languages. Further, we apply a recently-proposed heuristic method for training NMT models with very large vocabularies, and show that this vocabulary expansion method results in minimal degradation of embedding quality. This allows us to make a large vocabulary of NMT embeddings available for future research and applications. Overall, our analyses indicate that NMT embeddings should be used in applications that require word concepts to be organised according to similarity and/or lexical function, while monolingual embeddings are better suited to modelling (nonspecific) inter-word relatedness.

Original languageEnglish (US)
Pages (from-to)1-16
Number of pages16
JournalMachine Translation
DOIs
StateAccepted/In press - Apr 29 2017

Fingerprint

mathematics
Geometry
vocabulary
Heuristic methods
Semantics
Degradation
Machine Translation
Word Meaning
heuristics
semantics
language
Vocabulary

Keywords

  • Machine translation
  • Representation
  • Word embeddings

ASJC Scopus subject areas

  • Software
  • Language and Linguistics
  • Linguistics and Language
  • Artificial Intelligence

Cite this

The representational geometry of word meanings acquired by neural machine translation models. / Hill, Felix; Cho, Kyunghyun; Jean, Sébastien; Bengio, Yoshua.

In: Machine Translation, 29.04.2017, p. 1-16.

Research output: Contribution to journalArticle

@article{b218dd714aa244ecb15a6da9df8159f3,
title = "The representational geometry of word meanings acquired by neural machine translation models",
abstract = "This work is the first comprehensive analysis of the properties of word embeddings learned by neural machine translation (NMT) models trained on bilingual texts. We show the word representations of NMT models outperform those learned from monolingual text by established algorithms such as Skipgram and CBOW on tasks that require knowledge of semantic similarity and/or lexical–syntactic role. These effects hold when translating from English to French and English to German, and we argue that the desirable properties of NMT word embeddings should emerge largely independently of the source and target languages. Further, we apply a recently-proposed heuristic method for training NMT models with very large vocabularies, and show that this vocabulary expansion method results in minimal degradation of embedding quality. This allows us to make a large vocabulary of NMT embeddings available for future research and applications. Overall, our analyses indicate that NMT embeddings should be used in applications that require word concepts to be organised according to similarity and/or lexical function, while monolingual embeddings are better suited to modelling (nonspecific) inter-word relatedness.",
keywords = "Machine translation, Representation, Word embeddings",
author = "Felix Hill and Kyunghyun Cho and S{\'e}bastien Jean and Yoshua Bengio",
year = "2017",
month = "4",
day = "29",
doi = "10.1007/s10590-017-9194-2",
language = "English (US)",
pages = "1--16",
journal = "Machine Translation",
issn = "0922-6567",
publisher = "Springer Netherlands",

}

TY - JOUR

T1 - The representational geometry of word meanings acquired by neural machine translation models

AU - Hill, Felix

AU - Cho, Kyunghyun

AU - Jean, Sébastien

AU - Bengio, Yoshua

PY - 2017/4/29

Y1 - 2017/4/29

N2 - This work is the first comprehensive analysis of the properties of word embeddings learned by neural machine translation (NMT) models trained on bilingual texts. We show the word representations of NMT models outperform those learned from monolingual text by established algorithms such as Skipgram and CBOW on tasks that require knowledge of semantic similarity and/or lexical–syntactic role. These effects hold when translating from English to French and English to German, and we argue that the desirable properties of NMT word embeddings should emerge largely independently of the source and target languages. Further, we apply a recently-proposed heuristic method for training NMT models with very large vocabularies, and show that this vocabulary expansion method results in minimal degradation of embedding quality. This allows us to make a large vocabulary of NMT embeddings available for future research and applications. Overall, our analyses indicate that NMT embeddings should be used in applications that require word concepts to be organised according to similarity and/or lexical function, while monolingual embeddings are better suited to modelling (nonspecific) inter-word relatedness.

AB - This work is the first comprehensive analysis of the properties of word embeddings learned by neural machine translation (NMT) models trained on bilingual texts. We show the word representations of NMT models outperform those learned from monolingual text by established algorithms such as Skipgram and CBOW on tasks that require knowledge of semantic similarity and/or lexical–syntactic role. These effects hold when translating from English to French and English to German, and we argue that the desirable properties of NMT word embeddings should emerge largely independently of the source and target languages. Further, we apply a recently-proposed heuristic method for training NMT models with very large vocabularies, and show that this vocabulary expansion method results in minimal degradation of embedding quality. This allows us to make a large vocabulary of NMT embeddings available for future research and applications. Overall, our analyses indicate that NMT embeddings should be used in applications that require word concepts to be organised according to similarity and/or lexical function, while monolingual embeddings are better suited to modelling (nonspecific) inter-word relatedness.

KW - Machine translation

KW - Representation

KW - Word embeddings

UR - http://www.scopus.com/inward/record.url?scp=85018330619&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85018330619&partnerID=8YFLogxK

U2 - 10.1007/s10590-017-9194-2

DO - 10.1007/s10590-017-9194-2

M3 - Article

AN - SCOPUS:85018330619

SP - 1

EP - 16

JO - Machine Translation

JF - Machine Translation

SN - 0922-6567

ER -