Context-dependent word representation for neural machine translation

Heeyoul Choi, Kyunghyun Cho, Yoshua Bengio

Research output: Contribution to journalArticle

Abstract

We first observe a potential weakness of continuous vector representations of symbols in neural machine translation. That is, the continuous vector representation, or a word embedding vector, of a symbol encodes multiple dimensions of similarity, equivalent to encoding more than one meaning of the word. This has the consequence that the encoder and decoder recurrent networks in neural machine translation need to spend substantial amount of their capacity in disambiguating source and target words based on the context which is defined by a source sentence. Based on this observation, in this paper we propose to contextualize the word embedding vectors using a nonlinear bag-of-words representation of the source sentence. Additionally, we propose to represent special tokens (such as numbers, proper nouns and acronyms) with typed symbols to facilitate translating those words that are not well-suited to be translated via continuous vectors. The experiments on En-Fr and En-De reveal that the proposed approaches of contextualization and symbolization improves the translation quality of neural machine translation systems significantly.

Original languageEnglish (US)
JournalComputer Speech and Language
DOIs
StateAccepted/In press - Apr 18 2016

Fingerprint

Machine Translation
Dependent
Recurrent Networks
Acronym
Encoder
Encoding
Context
Target
Experiment
Experiments

Keywords

  • Contextualization
  • Neural machine translation
  • Symbolization

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Software
  • Human-Computer Interaction

Cite this

Context-dependent word representation for neural machine translation. / Choi, Heeyoul; Cho, Kyunghyun; Bengio, Yoshua.

In: Computer Speech and Language, 18.04.2016.

Research output: Contribution to journalArticle

@article{0418746acc20432e9ae12f31fe170a0a,
title = "Context-dependent word representation for neural machine translation",
abstract = "We first observe a potential weakness of continuous vector representations of symbols in neural machine translation. That is, the continuous vector representation, or a word embedding vector, of a symbol encodes multiple dimensions of similarity, equivalent to encoding more than one meaning of the word. This has the consequence that the encoder and decoder recurrent networks in neural machine translation need to spend substantial amount of their capacity in disambiguating source and target words based on the context which is defined by a source sentence. Based on this observation, in this paper we propose to contextualize the word embedding vectors using a nonlinear bag-of-words representation of the source sentence. Additionally, we propose to represent special tokens (such as numbers, proper nouns and acronyms) with typed symbols to facilitate translating those words that are not well-suited to be translated via continuous vectors. The experiments on En-Fr and En-De reveal that the proposed approaches of contextualization and symbolization improves the translation quality of neural machine translation systems significantly.",
keywords = "Contextualization, Neural machine translation, Symbolization",
author = "Heeyoul Choi and Kyunghyun Cho and Yoshua Bengio",
year = "2016",
month = "4",
day = "18",
doi = "10.1016/j.csl.2017.01.007",
language = "English (US)",
journal = "Computer Speech and Language",
issn = "0885-2308",
publisher = "Academic Press Inc.",

}

TY - JOUR

T1 - Context-dependent word representation for neural machine translation

AU - Choi, Heeyoul

AU - Cho, Kyunghyun

AU - Bengio, Yoshua

PY - 2016/4/18

Y1 - 2016/4/18

N2 - We first observe a potential weakness of continuous vector representations of symbols in neural machine translation. That is, the continuous vector representation, or a word embedding vector, of a symbol encodes multiple dimensions of similarity, equivalent to encoding more than one meaning of the word. This has the consequence that the encoder and decoder recurrent networks in neural machine translation need to spend substantial amount of their capacity in disambiguating source and target words based on the context which is defined by a source sentence. Based on this observation, in this paper we propose to contextualize the word embedding vectors using a nonlinear bag-of-words representation of the source sentence. Additionally, we propose to represent special tokens (such as numbers, proper nouns and acronyms) with typed symbols to facilitate translating those words that are not well-suited to be translated via continuous vectors. The experiments on En-Fr and En-De reveal that the proposed approaches of contextualization and symbolization improves the translation quality of neural machine translation systems significantly.

AB - We first observe a potential weakness of continuous vector representations of symbols in neural machine translation. That is, the continuous vector representation, or a word embedding vector, of a symbol encodes multiple dimensions of similarity, equivalent to encoding more than one meaning of the word. This has the consequence that the encoder and decoder recurrent networks in neural machine translation need to spend substantial amount of their capacity in disambiguating source and target words based on the context which is defined by a source sentence. Based on this observation, in this paper we propose to contextualize the word embedding vectors using a nonlinear bag-of-words representation of the source sentence. Additionally, we propose to represent special tokens (such as numbers, proper nouns and acronyms) with typed symbols to facilitate translating those words that are not well-suited to be translated via continuous vectors. The experiments on En-Fr and En-De reveal that the proposed approaches of contextualization and symbolization improves the translation quality of neural machine translation systems significantly.

KW - Contextualization

KW - Neural machine translation

KW - Symbolization

UR - http://www.scopus.com/inward/record.url?scp=85014469233&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85014469233&partnerID=8YFLogxK

U2 - 10.1016/j.csl.2017.01.007

DO - 10.1016/j.csl.2017.01.007

M3 - Article

JO - Computer Speech and Language

JF - Computer Speech and Language

SN - 0885-2308

ER -