Automatic transliteration of romanized dialectal Arabic

Mohamed Al-Badrashiny, Ramy Eskander, Nizar Habash, Owen Rambow

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we address the problem of converting Dialectal Arabic (DA) text that is written in the Latin script (called Arabizi) into Arabic script following the CODA convention for DA orthography. The presented system uses a finite state transducer trained at the character level to generate all possible transliterations for the input Arabizi words. We then filter the generated list using a DA morphological analyzer. After that we pick the best choice for each input word using a language model. We achieve an accuracy of 69.4% on an unseen test set compared to 63.1% using a system which represents a previously proposed approach.

Original languageEnglish (US)
Title of host publicationCoNLL 2014 - 18th Conference on Computational Natural Language Learning, Proceedings
PublisherAssociation for Computational Linguistics (ACL)
Pages30-38
Number of pages9
ISBN (Electronic)9781941643020
StatePublished - Jan 1 2014
Event18th Conference on Computational Natural Language Learning, CoNLL 2014 - Baltimore, United States
Duration: Jun 26 2014Jun 27 2014

Publication series

NameCoNLL 2014 - 18th Conference on Computational Natural Language Learning, Proceedings

Conference

Conference18th Conference on Computational Natural Language Learning, CoNLL 2014
CountryUnited States
CityBaltimore
Period6/26/146/27/14

Fingerprint

Transducers
orthography
language

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Artificial Intelligence
  • Linguistics and Language

Cite this

Al-Badrashiny, M., Eskander, R., Habash, N., & Rambow, O. (2014). Automatic transliteration of romanized dialectal Arabic. In CoNLL 2014 - 18th Conference on Computational Natural Language Learning, Proceedings (pp. 30-38). (CoNLL 2014 - 18th Conference on Computational Natural Language Learning, Proceedings). Association for Computational Linguistics (ACL).

Automatic transliteration of romanized dialectal Arabic. / Al-Badrashiny, Mohamed; Eskander, Ramy; Habash, Nizar; Rambow, Owen.

CoNLL 2014 - 18th Conference on Computational Natural Language Learning, Proceedings. Association for Computational Linguistics (ACL), 2014. p. 30-38 (CoNLL 2014 - 18th Conference on Computational Natural Language Learning, Proceedings).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Al-Badrashiny, M, Eskander, R, Habash, N & Rambow, O 2014, Automatic transliteration of romanized dialectal Arabic. in CoNLL 2014 - 18th Conference on Computational Natural Language Learning, Proceedings. CoNLL 2014 - 18th Conference on Computational Natural Language Learning, Proceedings, Association for Computational Linguistics (ACL), pp. 30-38, 18th Conference on Computational Natural Language Learning, CoNLL 2014, Baltimore, United States, 6/26/14.
Al-Badrashiny M, Eskander R, Habash N, Rambow O. Automatic transliteration of romanized dialectal Arabic. In CoNLL 2014 - 18th Conference on Computational Natural Language Learning, Proceedings. Association for Computational Linguistics (ACL). 2014. p. 30-38. (CoNLL 2014 - 18th Conference on Computational Natural Language Learning, Proceedings).
Al-Badrashiny, Mohamed ; Eskander, Ramy ; Habash, Nizar ; Rambow, Owen. / Automatic transliteration of romanized dialectal Arabic. CoNLL 2014 - 18th Conference on Computational Natural Language Learning, Proceedings. Association for Computational Linguistics (ACL), 2014. pp. 30-38 (CoNLL 2014 - 18th Conference on Computational Natural Language Learning, Proceedings).
@inproceedings{fd6bf45c6df446e79dd85cd7db84db5d,
title = "Automatic transliteration of romanized dialectal Arabic",
abstract = "In this paper, we address the problem of converting Dialectal Arabic (DA) text that is written in the Latin script (called Arabizi) into Arabic script following the CODA convention for DA orthography. The presented system uses a finite state transducer trained at the character level to generate all possible transliterations for the input Arabizi words. We then filter the generated list using a DA morphological analyzer. After that we pick the best choice for each input word using a language model. We achieve an accuracy of 69.4{\%} on an unseen test set compared to 63.1{\%} using a system which represents a previously proposed approach.",
author = "Mohamed Al-Badrashiny and Ramy Eskander and Nizar Habash and Owen Rambow",
year = "2014",
month = "1",
day = "1",
language = "English (US)",
series = "CoNLL 2014 - 18th Conference on Computational Natural Language Learning, Proceedings",
publisher = "Association for Computational Linguistics (ACL)",
pages = "30--38",
booktitle = "CoNLL 2014 - 18th Conference on Computational Natural Language Learning, Proceedings",

}

TY - GEN

T1 - Automatic transliteration of romanized dialectal Arabic

AU - Al-Badrashiny, Mohamed

AU - Eskander, Ramy

AU - Habash, Nizar

AU - Rambow, Owen

PY - 2014/1/1

Y1 - 2014/1/1

N2 - In this paper, we address the problem of converting Dialectal Arabic (DA) text that is written in the Latin script (called Arabizi) into Arabic script following the CODA convention for DA orthography. The presented system uses a finite state transducer trained at the character level to generate all possible transliterations for the input Arabizi words. We then filter the generated list using a DA morphological analyzer. After that we pick the best choice for each input word using a language model. We achieve an accuracy of 69.4% on an unseen test set compared to 63.1% using a system which represents a previously proposed approach.

AB - In this paper, we address the problem of converting Dialectal Arabic (DA) text that is written in the Latin script (called Arabizi) into Arabic script following the CODA convention for DA orthography. The presented system uses a finite state transducer trained at the character level to generate all possible transliterations for the input Arabizi words. We then filter the generated list using a DA morphological analyzer. After that we pick the best choice for each input word using a language model. We achieve an accuracy of 69.4% on an unseen test set compared to 63.1% using a system which represents a previously proposed approach.

UR - http://www.scopus.com/inward/record.url?scp=84942564430&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84942564430&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84942564430

T3 - CoNLL 2014 - 18th Conference on Computational Natural Language Learning, Proceedings

SP - 30

EP - 38

BT - CoNLL 2014 - 18th Conference on Computational Natural Language Learning, Proceedings

PB - Association for Computational Linguistics (ACL)

ER -