A parallel corpus for evaluating machine translation between Arabic and european languages

Nizar Habash, Nasser Zalmout, Dima Taji, Hoang Hieu, Maverick Alzate

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present Arab-Acquis, a large publicly available dataset for evaluating machine translation between 22 European languages and Arabic. Arab-Acquis consists of over 12,000 sentences from the JRCAcquis (Acquis Communautaire) corpus translated twice by professional translators, once from English and once from French, and totaling over 600,000 words. The corpus follows previous data splits in the literature for tuning, development, and testing. We describe the corpus and how it was created. We also present the first benchmarking results on translating to and from Arabic for 22 European languages.

Original languageEnglish (US)
Title of host publicationShort Papers
PublisherAssociation for Computational Linguistics (ACL)
Pages235-241
Number of pages7
Volume2
ISBN (Electronic)9781510838604
StatePublished - Jan 1 2017
Event15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017 - Valencia, Spain
Duration: Apr 3 2017Apr 7 2017

Other

Other15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017
CountrySpain
CityValencia
Period4/3/174/7/17

Fingerprint

benchmarking
language
translator
European Languages
Parallel Corpora
Arabic Language
Machine Translation
literature
Split
Benchmarking
Tuning
Translator
Translating
Testing

ASJC Scopus subject areas

  • Linguistics and Language
  • Language and Linguistics

Cite this

Habash, N., Zalmout, N., Taji, D., Hieu, H., & Alzate, M. (2017). A parallel corpus for evaluating machine translation between Arabic and european languages. In Short Papers (Vol. 2, pp. 235-241). Association for Computational Linguistics (ACL).

A parallel corpus for evaluating machine translation between Arabic and european languages. / Habash, Nizar; Zalmout, Nasser; Taji, Dima; Hieu, Hoang; Alzate, Maverick.

Short Papers. Vol. 2 Association for Computational Linguistics (ACL), 2017. p. 235-241.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Habash, N, Zalmout, N, Taji, D, Hieu, H & Alzate, M 2017, A parallel corpus for evaluating machine translation between Arabic and european languages. in Short Papers. vol. 2, Association for Computational Linguistics (ACL), pp. 235-241, 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, Valencia, Spain, 4/3/17.
Habash N, Zalmout N, Taji D, Hieu H, Alzate M. A parallel corpus for evaluating machine translation between Arabic and european languages. In Short Papers. Vol. 2. Association for Computational Linguistics (ACL). 2017. p. 235-241
Habash, Nizar ; Zalmout, Nasser ; Taji, Dima ; Hieu, Hoang ; Alzate, Maverick. / A parallel corpus for evaluating machine translation between Arabic and european languages. Short Papers. Vol. 2 Association for Computational Linguistics (ACL), 2017. pp. 235-241
@inproceedings{1a32bcf9c3c64c72a9da424549ac3ebe,
title = "A parallel corpus for evaluating machine translation between Arabic and european languages",
abstract = "We present Arab-Acquis, a large publicly available dataset for evaluating machine translation between 22 European languages and Arabic. Arab-Acquis consists of over 12,000 sentences from the JRCAcquis (Acquis Communautaire) corpus translated twice by professional translators, once from English and once from French, and totaling over 600,000 words. The corpus follows previous data splits in the literature for tuning, development, and testing. We describe the corpus and how it was created. We also present the first benchmarking results on translating to and from Arabic for 22 European languages.",
author = "Nizar Habash and Nasser Zalmout and Dima Taji and Hoang Hieu and Maverick Alzate",
year = "2017",
month = "1",
day = "1",
language = "English (US)",
volume = "2",
pages = "235--241",
booktitle = "Short Papers",
publisher = "Association for Computational Linguistics (ACL)",

}

TY - GEN

T1 - A parallel corpus for evaluating machine translation between Arabic and european languages

AU - Habash, Nizar

AU - Zalmout, Nasser

AU - Taji, Dima

AU - Hieu, Hoang

AU - Alzate, Maverick

PY - 2017/1/1

Y1 - 2017/1/1

N2 - We present Arab-Acquis, a large publicly available dataset for evaluating machine translation between 22 European languages and Arabic. Arab-Acquis consists of over 12,000 sentences from the JRCAcquis (Acquis Communautaire) corpus translated twice by professional translators, once from English and once from French, and totaling over 600,000 words. The corpus follows previous data splits in the literature for tuning, development, and testing. We describe the corpus and how it was created. We also present the first benchmarking results on translating to and from Arabic for 22 European languages.

AB - We present Arab-Acquis, a large publicly available dataset for evaluating machine translation between 22 European languages and Arabic. Arab-Acquis consists of over 12,000 sentences from the JRCAcquis (Acquis Communautaire) corpus translated twice by professional translators, once from English and once from French, and totaling over 600,000 words. The corpus follows previous data splits in the literature for tuning, development, and testing. We describe the corpus and how it was created. We also present the first benchmarking results on translating to and from Arabic for 22 European languages.

UR - http://www.scopus.com/inward/record.url?scp=85021665456&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85021665456&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85021665456

VL - 2

SP - 235

EP - 241

BT - Short Papers

PB - Association for Computational Linguistics (ACL)

ER -