On integrating a language model into neural machine translation

Caglar Gulcehre, Orhan Firat, Kelvin Xu, Kyunghyun Cho, Yoshua Bengio

Research output: Contribution to journalArticle

Abstract

Recent advances in end-to-end neural machine translation models have achieved promising results on high-resource language pairs such as En→ Fr and En→ De. One of the major factor behind these successes is the availability of high quality parallel corpora. We explore two strategies on leveraging abundant amount of monolingual data for neural machine translation. We observe improvements by both combining scores from neural language model trained only on target monolingual data with neural machine translation model and fusing hidden-states of these two models. We obtain up to 2 BLEU improvement over hierarchical and phrase-based baseline on low-resource language pair, Turkish→ English. Our method was initially motivated towards tasks with less parallel data, but we also show that it extends to high resource languages such as Cs→ En and De→ En translation tasks, where we obtain 0.39 and 0.47 BLEU improvements over the neural machine translation baselines, respectively.

Original languageEnglish (US)
JournalComputer Speech and Language
DOIs
StateAccepted/In press - May 1 2016

Fingerprint

Machine Translation
Language Model
Resources
Baseline
Availability
Model
Target
Language

Keywords

  • Deep learning
  • Language models
  • Low resource machine translation
  • Monolingual data
  • Neural machine translation
  • Neural network

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Software
  • Human-Computer Interaction

Cite this

On integrating a language model into neural machine translation. / Gulcehre, Caglar; Firat, Orhan; Xu, Kelvin; Cho, Kyunghyun; Bengio, Yoshua.

In: Computer Speech and Language, 01.05.2016.

Research output: Contribution to journalArticle

Gulcehre, Caglar ; Firat, Orhan ; Xu, Kelvin ; Cho, Kyunghyun ; Bengio, Yoshua. / On integrating a language model into neural machine translation. In: Computer Speech and Language. 2016.
@article{9a7e5bdf2b9b4df8b46c996f56c5b884,
title = "On integrating a language model into neural machine translation",
abstract = "Recent advances in end-to-end neural machine translation models have achieved promising results on high-resource language pairs such as En→ Fr and En→ De. One of the major factor behind these successes is the availability of high quality parallel corpora. We explore two strategies on leveraging abundant amount of monolingual data for neural machine translation. We observe improvements by both combining scores from neural language model trained only on target monolingual data with neural machine translation model and fusing hidden-states of these two models. We obtain up to 2 BLEU improvement over hierarchical and phrase-based baseline on low-resource language pair, Turkish→ English. Our method was initially motivated towards tasks with less parallel data, but we also show that it extends to high resource languages such as Cs→ En and De→ En translation tasks, where we obtain 0.39 and 0.47 BLEU improvements over the neural machine translation baselines, respectively.",
keywords = "Deep learning, Language models, Low resource machine translation, Monolingual data, Neural machine translation, Neural network",
author = "Caglar Gulcehre and Orhan Firat and Kelvin Xu and Kyunghyun Cho and Yoshua Bengio",
year = "2016",
month = "5",
day = "1",
doi = "10.1016/j.csl.2017.01.014",
language = "English (US)",
journal = "Computer Speech and Language",
issn = "0885-2308",
publisher = "Academic Press Inc.",

}

TY - JOUR

T1 - On integrating a language model into neural machine translation

AU - Gulcehre, Caglar

AU - Firat, Orhan

AU - Xu, Kelvin

AU - Cho, Kyunghyun

AU - Bengio, Yoshua

PY - 2016/5/1

Y1 - 2016/5/1

N2 - Recent advances in end-to-end neural machine translation models have achieved promising results on high-resource language pairs such as En→ Fr and En→ De. One of the major factor behind these successes is the availability of high quality parallel corpora. We explore two strategies on leveraging abundant amount of monolingual data for neural machine translation. We observe improvements by both combining scores from neural language model trained only on target monolingual data with neural machine translation model and fusing hidden-states of these two models. We obtain up to 2 BLEU improvement over hierarchical and phrase-based baseline on low-resource language pair, Turkish→ English. Our method was initially motivated towards tasks with less parallel data, but we also show that it extends to high resource languages such as Cs→ En and De→ En translation tasks, where we obtain 0.39 and 0.47 BLEU improvements over the neural machine translation baselines, respectively.

AB - Recent advances in end-to-end neural machine translation models have achieved promising results on high-resource language pairs such as En→ Fr and En→ De. One of the major factor behind these successes is the availability of high quality parallel corpora. We explore two strategies on leveraging abundant amount of monolingual data for neural machine translation. We observe improvements by both combining scores from neural language model trained only on target monolingual data with neural machine translation model and fusing hidden-states of these two models. We obtain up to 2 BLEU improvement over hierarchical and phrase-based baseline on low-resource language pair, Turkish→ English. Our method was initially motivated towards tasks with less parallel data, but we also show that it extends to high resource languages such as Cs→ En and De→ En translation tasks, where we obtain 0.39 and 0.47 BLEU improvements over the neural machine translation baselines, respectively.

KW - Deep learning

KW - Language models

KW - Low resource machine translation

KW - Monolingual data

KW - Neural machine translation

KW - Neural network

UR - http://www.scopus.com/inward/record.url?scp=85016590817&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85016590817&partnerID=8YFLogxK

U2 - 10.1016/j.csl.2017.01.014

DO - 10.1016/j.csl.2017.01.014

M3 - Article

AN - SCOPUS:85016590817

JO - Computer Speech and Language

JF - Computer Speech and Language

SN - 0885-2308

ER -