Neural machine translation by jointly learning to align and translate

Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio

Research output: Contribution to conferencePaper

Abstract

Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder–decoders and encode a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder–decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.

Original languageEnglish (US)
StatePublished - Jan 1 2015
Event3rd International Conference on Learning Representations, ICLR 2015 - San Diego, United States
Duration: May 7 2015May 9 2015

Conference

Conference3rd International Conference on Learning Representations, ICLR 2015
CountryUnited States
CitySan Diego
Period5/7/155/9/15

Fingerprint

learning
Neural networks
performance
Machine Translation
intuition
neural network
Length
French Translation
Neural Networks
Qualitative Analysis
Intuition
Statistical Machine Translation
Alignment

ASJC Scopus subject areas

  • Education
  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Cite this

Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. Paper presented at 3rd International Conference on Learning Representations, ICLR 2015, San Diego, United States.

Neural machine translation by jointly learning to align and translate. / Bahdanau, Dzmitry; Cho, Kyunghyun; Bengio, Yoshua.

2015. Paper presented at 3rd International Conference on Learning Representations, ICLR 2015, San Diego, United States.

Research output: Contribution to conferencePaper

Bahdanau, D, Cho, K & Bengio, Y 2015, 'Neural machine translation by jointly learning to align and translate', Paper presented at 3rd International Conference on Learning Representations, ICLR 2015, San Diego, United States, 5/7/15 - 5/9/15.
Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. 2015. Paper presented at 3rd International Conference on Learning Representations, ICLR 2015, San Diego, United States.
Bahdanau, Dzmitry ; Cho, Kyunghyun ; Bengio, Yoshua. / Neural machine translation by jointly learning to align and translate. Paper presented at 3rd International Conference on Learning Representations, ICLR 2015, San Diego, United States.
@conference{38ed090f8de94fb3b0b46b86f9133623,
title = "Neural machine translation by jointly learning to align and translate",
abstract = "Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder–decoders and encode a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder–decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.",
author = "Dzmitry Bahdanau and Kyunghyun Cho and Yoshua Bengio",
year = "2015",
month = "1",
day = "1",
language = "English (US)",
note = "3rd International Conference on Learning Representations, ICLR 2015 ; Conference date: 07-05-2015 Through 09-05-2015",

}

TY - CONF

T1 - Neural machine translation by jointly learning to align and translate

AU - Bahdanau, Dzmitry

AU - Cho, Kyunghyun

AU - Bengio, Yoshua

PY - 2015/1/1

Y1 - 2015/1/1

N2 - Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder–decoders and encode a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder–decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.

AB - Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder–decoders and encode a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder–decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.

UR - http://www.scopus.com/inward/record.url?scp=85062889504&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85062889504&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:85062889504

ER -