Large scale Arabic error annotation: Guidelines and framework

Wajdi Zaghouani, Behrang Mohit, Nizar Habash, Ossama Obeid, Nadi Tomeh, Alla Rozovskaya, Noura Farra, Sarah Alkuhlani, Kemal Oflazer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present annotation guidelines and a web-based annotation framework developed as part of an effort to create a manually annotated Arabic corpus of errors and corrections for various text types. Such a corpus will be invaluable for developing Arabic error correction tools, both for training models and as a gold standard for evaluating error correction algorithms. We summarize the guidelines we created. We also describe issues encountered during the training of the annotators, as well as problems that are specific to the Arabic language that arose during the annotation process. Finally, we present the annotation tool that was developed as part of this project, the annotation pipeline, and the quality of the resulting annotations.

Original languageEnglish (US)
Title of host publicationProceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014
EditorsNicoletta Calzolari, Khalid Choukri, Sara Goggi, Thierry Declerck, Joseph Mariani, Bente Maegaard, Asuncion Moreno, Jan Odijk, Helene Mazo, Stelios Piperidis, Hrafn Loftsson
PublisherEuropean Language Resources Association (ELRA)
Pages2362-2369
Number of pages8
ISBN (Electronic)9782951740884
StatePublished - Jan 1 2014
Event9th International Conference on Language Resources and Evaluation, LREC 2014 - Reykjavik, Iceland
Duration: May 26 2014May 31 2014

Publication series

NameProceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014

Other

Other9th International Conference on Language Resources and Evaluation, LREC 2014
CountryIceland
CityReykjavik
Period5/26/145/31/14

Keywords

  • Arabic
  • Error annotation
  • Guidelines

ASJC Scopus subject areas

  • Linguistics and Language
  • Library and Information Sciences
  • Education
  • Language and Linguistics

Fingerprint Dive into the research topics of 'Large scale Arabic error annotation: Guidelines and framework'. Together they form a unique fingerprint.

  • Cite this

    Zaghouani, W., Mohit, B., Habash, N., Obeid, O., Tomeh, N., Rozovskaya, A., Farra, N., Alkuhlani, S., & Oflazer, K. (2014). Large scale Arabic error annotation: Guidelines and framework. In N. Calzolari, K. Choukri, S. Goggi, T. Declerck, J. Mariani, B. Maegaard, A. Moreno, J. Odijk, H. Mazo, S. Piperidis, & H. Loftsson (Eds.), Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014 (pp. 2362-2369). (Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014). European Language Resources Association (ELRA).