Building an Arabic machine translation post-edited corpus: Guidelines and annotation

Wajdi Zaghouani, Nizar Habash, Ossama Obeid, Behrang Mohit, Houda Bouamor, Kemal Oflazer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present our guidelines and annotation procedure to create a human corrected machine translated post-edited corpus for the Modern Standard Arabic. Our overarching goal is to use the annotated corpus to develop automatic machine translation post-editing systems for Arabic that can be used to help accelerate the human revision process of translated texts. The creation of any manually annotated corpus usually presents many challenges. In order to address these challenges, we created comprehensive and simplified annotation guidelines which were used by a team of five annotators and one lead annotator. In order to ensure a high annotation agreement between the annotators, multiple training sessions were held and regular inter-annotator agreement measures were performed to check the annotation quality. The created corpus of manual post-edited translations of English to Arabic articles is the largest to date for this language pair.

Original languageEnglish (US)
Title of host publicationProceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016
PublisherEuropean Language Resources Association (ELRA)
Pages1869-1876
Number of pages8
ISBN (Electronic)9782951740891
StatePublished - Jan 1 2016
Event10th International Conference on Language Resources and Evaluation, LREC 2016 - Portoroz, Slovenia
Duration: May 23 2016May 28 2016

Other

Other10th International Conference on Language Resources and Evaluation, LREC 2016
CountrySlovenia
CityPortoroz
Period5/23/165/28/16

Fingerprint

language
Machine Translation
Annotation
Editing
Language
Regular

Keywords

  • Annotation
  • Guidelines
  • Post-editing

ASJC Scopus subject areas

  • Linguistics and Language
  • Library and Information Sciences
  • Language and Linguistics
  • Education

Cite this

Zaghouani, W., Habash, N., Obeid, O., Mohit, B., Bouamor, H., & Oflazer, K. (2016). Building an Arabic machine translation post-edited corpus: Guidelines and annotation. In Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016 (pp. 1869-1876). European Language Resources Association (ELRA).

Building an Arabic machine translation post-edited corpus : Guidelines and annotation. / Zaghouani, Wajdi; Habash, Nizar; Obeid, Ossama; Mohit, Behrang; Bouamor, Houda; Oflazer, Kemal.

Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016. European Language Resources Association (ELRA), 2016. p. 1869-1876.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zaghouani, W, Habash, N, Obeid, O, Mohit, B, Bouamor, H & Oflazer, K 2016, Building an Arabic machine translation post-edited corpus: Guidelines and annotation. in Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016. European Language Resources Association (ELRA), pp. 1869-1876, 10th International Conference on Language Resources and Evaluation, LREC 2016, Portoroz, Slovenia, 5/23/16.
Zaghouani W, Habash N, Obeid O, Mohit B, Bouamor H, Oflazer K. Building an Arabic machine translation post-edited corpus: Guidelines and annotation. In Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016. European Language Resources Association (ELRA). 2016. p. 1869-1876
Zaghouani, Wajdi ; Habash, Nizar ; Obeid, Ossama ; Mohit, Behrang ; Bouamor, Houda ; Oflazer, Kemal. / Building an Arabic machine translation post-edited corpus : Guidelines and annotation. Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016. European Language Resources Association (ELRA), 2016. pp. 1869-1876
@inproceedings{f395e42b909e4ee596ce549bd174c060,
title = "Building an Arabic machine translation post-edited corpus: Guidelines and annotation",
abstract = "We present our guidelines and annotation procedure to create a human corrected machine translated post-edited corpus for the Modern Standard Arabic. Our overarching goal is to use the annotated corpus to develop automatic machine translation post-editing systems for Arabic that can be used to help accelerate the human revision process of translated texts. The creation of any manually annotated corpus usually presents many challenges. In order to address these challenges, we created comprehensive and simplified annotation guidelines which were used by a team of five annotators and one lead annotator. In order to ensure a high annotation agreement between the annotators, multiple training sessions were held and regular inter-annotator agreement measures were performed to check the annotation quality. The created corpus of manual post-edited translations of English to Arabic articles is the largest to date for this language pair.",
keywords = "Annotation, Guidelines, Post-editing",
author = "Wajdi Zaghouani and Nizar Habash and Ossama Obeid and Behrang Mohit and Houda Bouamor and Kemal Oflazer",
year = "2016",
month = "1",
day = "1",
language = "English (US)",
pages = "1869--1876",
booktitle = "Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016",
publisher = "European Language Resources Association (ELRA)",

}

TY - GEN

T1 - Building an Arabic machine translation post-edited corpus

T2 - Guidelines and annotation

AU - Zaghouani, Wajdi

AU - Habash, Nizar

AU - Obeid, Ossama

AU - Mohit, Behrang

AU - Bouamor, Houda

AU - Oflazer, Kemal

PY - 2016/1/1

Y1 - 2016/1/1

N2 - We present our guidelines and annotation procedure to create a human corrected machine translated post-edited corpus for the Modern Standard Arabic. Our overarching goal is to use the annotated corpus to develop automatic machine translation post-editing systems for Arabic that can be used to help accelerate the human revision process of translated texts. The creation of any manually annotated corpus usually presents many challenges. In order to address these challenges, we created comprehensive and simplified annotation guidelines which were used by a team of five annotators and one lead annotator. In order to ensure a high annotation agreement between the annotators, multiple training sessions were held and regular inter-annotator agreement measures were performed to check the annotation quality. The created corpus of manual post-edited translations of English to Arabic articles is the largest to date for this language pair.

AB - We present our guidelines and annotation procedure to create a human corrected machine translated post-edited corpus for the Modern Standard Arabic. Our overarching goal is to use the annotated corpus to develop automatic machine translation post-editing systems for Arabic that can be used to help accelerate the human revision process of translated texts. The creation of any manually annotated corpus usually presents many challenges. In order to address these challenges, we created comprehensive and simplified annotation guidelines which were used by a team of five annotators and one lead annotator. In order to ensure a high annotation agreement between the annotators, multiple training sessions were held and regular inter-annotator agreement measures were performed to check the annotation quality. The created corpus of manual post-edited translations of English to Arabic articles is the largest to date for this language pair.

KW - Annotation

KW - Guidelines

KW - Post-editing

UR - http://www.scopus.com/inward/record.url?scp=85037070308&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85037070308&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85037070308

SP - 1869

EP - 1876

BT - Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016

PB - European Language Resources Association (ELRA)

ER -