Fine-grained attention mechanism for neural machine translation

Heeyoul Choi, Kyunghyun Cho, Yoshua Bengio

Research output: Contribution to journalArticle

Abstract

Neural machine translation (NMT) has been a new paradigm in machine translation, and the attention mechanism has become the dominant approach with the state-of-the-art records in many language pairs. While there are variants of the attention mechanism, all of them use only temporal attention where one scalar value is assigned to one context vector corresponding to a source word. In this paper, we propose a fine-grained (or 2D) attention mechanism where each dimension of a context vector will receive a separate attention score. In experiments with the task of En-De and En-Fi translation, the fine-grained attention method improves the translation quality in terms of BLEU score. In addition, our alignment analysis reveals how the fine-grained attention mechanism exploits the internal structure of context vectors.

Original languageEnglish (US)
Pages (from-to)171-176
Number of pages6
JournalNeurocomputing
Volume284
DOIs
StatePublished - Apr 5 2018

Fingerprint

Language
Experiments

Keywords

  • Attention mechanism
  • Fine-grained attention
  • Neural machine translation

ASJC Scopus subject areas

  • Computer Science Applications
  • Cognitive Neuroscience
  • Artificial Intelligence

Cite this

Fine-grained attention mechanism for neural machine translation. / Choi, Heeyoul; Cho, Kyunghyun; Bengio, Yoshua.

In: Neurocomputing, Vol. 284, 05.04.2018, p. 171-176.

Research output: Contribution to journalArticle

Choi, Heeyoul ; Cho, Kyunghyun ; Bengio, Yoshua. / Fine-grained attention mechanism for neural machine translation. In: Neurocomputing. 2018 ; Vol. 284. pp. 171-176.
@article{ed6028377ad6470aad3addff48a1bfbd,
title = "Fine-grained attention mechanism for neural machine translation",
abstract = "Neural machine translation (NMT) has been a new paradigm in machine translation, and the attention mechanism has become the dominant approach with the state-of-the-art records in many language pairs. While there are variants of the attention mechanism, all of them use only temporal attention where one scalar value is assigned to one context vector corresponding to a source word. In this paper, we propose a fine-grained (or 2D) attention mechanism where each dimension of a context vector will receive a separate attention score. In experiments with the task of En-De and En-Fi translation, the fine-grained attention method improves the translation quality in terms of BLEU score. In addition, our alignment analysis reveals how the fine-grained attention mechanism exploits the internal structure of context vectors.",
keywords = "Attention mechanism, Fine-grained attention, Neural machine translation",
author = "Heeyoul Choi and Kyunghyun Cho and Yoshua Bengio",
year = "2018",
month = "4",
day = "5",
doi = "10.1016/j.neucom.2018.01.007",
language = "English (US)",
volume = "284",
pages = "171--176",
journal = "Neurocomputing",
issn = "0925-2312",
publisher = "Elsevier",

}

TY - JOUR

T1 - Fine-grained attention mechanism for neural machine translation

AU - Choi, Heeyoul

AU - Cho, Kyunghyun

AU - Bengio, Yoshua

PY - 2018/4/5

Y1 - 2018/4/5

N2 - Neural machine translation (NMT) has been a new paradigm in machine translation, and the attention mechanism has become the dominant approach with the state-of-the-art records in many language pairs. While there are variants of the attention mechanism, all of them use only temporal attention where one scalar value is assigned to one context vector corresponding to a source word. In this paper, we propose a fine-grained (or 2D) attention mechanism where each dimension of a context vector will receive a separate attention score. In experiments with the task of En-De and En-Fi translation, the fine-grained attention method improves the translation quality in terms of BLEU score. In addition, our alignment analysis reveals how the fine-grained attention mechanism exploits the internal structure of context vectors.

AB - Neural machine translation (NMT) has been a new paradigm in machine translation, and the attention mechanism has become the dominant approach with the state-of-the-art records in many language pairs. While there are variants of the attention mechanism, all of them use only temporal attention where one scalar value is assigned to one context vector corresponding to a source word. In this paper, we propose a fine-grained (or 2D) attention mechanism where each dimension of a context vector will receive a separate attention score. In experiments with the task of En-De and En-Fi translation, the fine-grained attention method improves the translation quality in terms of BLEU score. In addition, our alignment analysis reveals how the fine-grained attention mechanism exploits the internal structure of context vectors.

KW - Attention mechanism

KW - Fine-grained attention

KW - Neural machine translation

UR - http://www.scopus.com/inward/record.url?scp=85044868649&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85044868649&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2018.01.007

DO - 10.1016/j.neucom.2018.01.007

M3 - Article

VL - 284

SP - 171

EP - 176

JO - Neurocomputing

JF - Neurocomputing

SN - 0925-2312

ER -