On using very large target vocabulary for neural machine translation

Sebastien Jean, Kyunghyun Cho, Roland Memisevic, Yoshua Bengio

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Neural machine translation, a recently proposed approach to machine translation based purely on neural networks, has shown promising results compared to the existing approaches such as phrasebased statistical machine translation. Despite its recent success, neural machine translation has its limitation in handling a larger vocabulary, as training complexity as well as decoding complexity increase proportionally to the number of target words. In this paper, we propose a method based on importance sampling that allows us to use a very large target vocabulary without increasing training complexity. We show that decoding can be efficiently done even with the model having a very large target vocabulary by selecting only a small subset of the whole target vocabulary. The models trained by the proposed approach are empirically found to match, and in some cases outperform, the baseline models with a small vocabulary as well as the LSTM-based neural machine translation models. Furthermore, when we use an ensemble of a few models with very large target vocabularies, we achieve performance comparable to the state of the art (measured by BLEU) on both the English!German and English!French translation tasks of WMT'14.

Original languageEnglish (US)
Title of host publicationACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages1-10
Number of pages10
Volume1
ISBN (Print)9781941643723
StatePublished - 2015
Event53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL-IJCNLP 2015 - Beijing, China
Duration: Jul 26 2015Jul 31 2015

Other

Other53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL-IJCNLP 2015
CountryChina
CityBeijing
Period7/26/157/31/15

Fingerprint

Decoding
Importance sampling
Neural networks

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software

Cite this

Jean, S., Cho, K., Memisevic, R., & Bengio, Y. (2015). On using very large target vocabulary for neural machine translation. In ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference (Vol. 1, pp. 1-10). Association for Computational Linguistics (ACL).

On using very large target vocabulary for neural machine translation. / Jean, Sebastien; Cho, Kyunghyun; Memisevic, Roland; Bengio, Yoshua.

ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference. Vol. 1 Association for Computational Linguistics (ACL), 2015. p. 1-10.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Jean, S, Cho, K, Memisevic, R & Bengio, Y 2015, On using very large target vocabulary for neural machine translation. in ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference. vol. 1, Association for Computational Linguistics (ACL), pp. 1-10, 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL-IJCNLP 2015, Beijing, China, 7/26/15.
Jean S, Cho K, Memisevic R, Bengio Y. On using very large target vocabulary for neural machine translation. In ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference. Vol. 1. Association for Computational Linguistics (ACL). 2015. p. 1-10
Jean, Sebastien ; Cho, Kyunghyun ; Memisevic, Roland ; Bengio, Yoshua. / On using very large target vocabulary for neural machine translation. ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference. Vol. 1 Association for Computational Linguistics (ACL), 2015. pp. 1-10
@inproceedings{30c6daf4874d44a993649052c869d5b9,
title = "On using very large target vocabulary for neural machine translation",
abstract = "Neural machine translation, a recently proposed approach to machine translation based purely on neural networks, has shown promising results compared to the existing approaches such as phrasebased statistical machine translation. Despite its recent success, neural machine translation has its limitation in handling a larger vocabulary, as training complexity as well as decoding complexity increase proportionally to the number of target words. In this paper, we propose a method based on importance sampling that allows us to use a very large target vocabulary without increasing training complexity. We show that decoding can be efficiently done even with the model having a very large target vocabulary by selecting only a small subset of the whole target vocabulary. The models trained by the proposed approach are empirically found to match, and in some cases outperform, the baseline models with a small vocabulary as well as the LSTM-based neural machine translation models. Furthermore, when we use an ensemble of a few models with very large target vocabularies, we achieve performance comparable to the state of the art (measured by BLEU) on both the English!German and English!French translation tasks of WMT'14.",
author = "Sebastien Jean and Kyunghyun Cho and Roland Memisevic and Yoshua Bengio",
year = "2015",
language = "English (US)",
isbn = "9781941643723",
volume = "1",
pages = "1--10",
booktitle = "ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference",
publisher = "Association for Computational Linguistics (ACL)",

}

TY - GEN

T1 - On using very large target vocabulary for neural machine translation

AU - Jean, Sebastien

AU - Cho, Kyunghyun

AU - Memisevic, Roland

AU - Bengio, Yoshua

PY - 2015

Y1 - 2015

N2 - Neural machine translation, a recently proposed approach to machine translation based purely on neural networks, has shown promising results compared to the existing approaches such as phrasebased statistical machine translation. Despite its recent success, neural machine translation has its limitation in handling a larger vocabulary, as training complexity as well as decoding complexity increase proportionally to the number of target words. In this paper, we propose a method based on importance sampling that allows us to use a very large target vocabulary without increasing training complexity. We show that decoding can be efficiently done even with the model having a very large target vocabulary by selecting only a small subset of the whole target vocabulary. The models trained by the proposed approach are empirically found to match, and in some cases outperform, the baseline models with a small vocabulary as well as the LSTM-based neural machine translation models. Furthermore, when we use an ensemble of a few models with very large target vocabularies, we achieve performance comparable to the state of the art (measured by BLEU) on both the English!German and English!French translation tasks of WMT'14.

AB - Neural machine translation, a recently proposed approach to machine translation based purely on neural networks, has shown promising results compared to the existing approaches such as phrasebased statistical machine translation. Despite its recent success, neural machine translation has its limitation in handling a larger vocabulary, as training complexity as well as decoding complexity increase proportionally to the number of target words. In this paper, we propose a method based on importance sampling that allows us to use a very large target vocabulary without increasing training complexity. We show that decoding can be efficiently done even with the model having a very large target vocabulary by selecting only a small subset of the whole target vocabulary. The models trained by the proposed approach are empirically found to match, and in some cases outperform, the baseline models with a small vocabulary as well as the LSTM-based neural machine translation models. Furthermore, when we use an ensemble of a few models with very large target vocabularies, we achieve performance comparable to the state of the art (measured by BLEU) on both the English!German and English!French translation tasks of WMT'14.

UR - http://www.scopus.com/inward/record.url?scp=84943744936&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84943744936&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9781941643723

VL - 1

SP - 1

EP - 10

BT - ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference

PB - Association for Computational Linguistics (ACL)

ER -