Arabic transliteration of Romanized Tunisian dialect text: A preliminary investigation

Abir Masmoudi, Nizar Habash, Mariem Ellouze, Yannick Estève, Lamia Hadrich Belguith

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we describe the process of converting Tunisian Dialect text that is written in Latin script (also called Arabizi) into Arabic script following the CODA orthography convention for Dialectal Arabic. Our input consists of messages and comments taken from SMS, social networks and broadcast videos. The language used in social media and SMS messaging is characterized by the use of informal and non-standard vocabulary such as repeated letters for emphasis, typos, non-standard abbreviations, and nonlinguistic content, such as emoticons. There is a high degree of variation is spelling in Arabic dialects due to the lack of orthographic widely supported standards in both Arabic and Latin scripts. In the context of natural language processing, transliterating from Arabizi to Arabic script is a necessary step since most recently available tools for processing Arabic Dialects expect Arabic script input.

Original languageEnglish (US)
Title of host publicationComputational Linguistics and Intelligent Text Processing - 16th International Conference, CICLing 2015, Proceedings
PublisherSpringer-Verlag
Pages608-619
Number of pages12
ISBN (Print)9783319181103
DOIs
StatePublished - Jan 1 2015
Event16th Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2015 - Cairo, Egypt
Duration: Apr 14 2015Apr 20 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9041
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other16th Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2015
CountryEgypt
CityCairo
Period4/14/154/20/15

Fingerprint

Abbreviation
Social Media
Processing
Broadcast
Natural Language
Social Networks
Necessary
Text
Language
Context
Standards

Keywords

  • CODA
  • Corpus
  • Normalization
  • Transliteration
  • Tunisian Dialect

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Masmoudi, A., Habash, N., Ellouze, M., Estève, Y., & Hadrich Belguith, L. (2015). Arabic transliteration of Romanized Tunisian dialect text: A preliminary investigation. In Computational Linguistics and Intelligent Text Processing - 16th International Conference, CICLing 2015, Proceedings (pp. 608-619). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9041). Springer-Verlag. https://doi.org/10.1007/978-3-319-18111-0_46

Arabic transliteration of Romanized Tunisian dialect text : A preliminary investigation. / Masmoudi, Abir; Habash, Nizar; Ellouze, Mariem; Estève, Yannick; Hadrich Belguith, Lamia.

Computational Linguistics and Intelligent Text Processing - 16th International Conference, CICLing 2015, Proceedings. Springer-Verlag, 2015. p. 608-619 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9041).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Masmoudi, A, Habash, N, Ellouze, M, Estève, Y & Hadrich Belguith, L 2015, Arabic transliteration of Romanized Tunisian dialect text: A preliminary investigation. in Computational Linguistics and Intelligent Text Processing - 16th International Conference, CICLing 2015, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9041, Springer-Verlag, pp. 608-619, 16th Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2015, Cairo, Egypt, 4/14/15. https://doi.org/10.1007/978-3-319-18111-0_46
Masmoudi A, Habash N, Ellouze M, Estève Y, Hadrich Belguith L. Arabic transliteration of Romanized Tunisian dialect text: A preliminary investigation. In Computational Linguistics and Intelligent Text Processing - 16th International Conference, CICLing 2015, Proceedings. Springer-Verlag. 2015. p. 608-619. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-18111-0_46
Masmoudi, Abir ; Habash, Nizar ; Ellouze, Mariem ; Estève, Yannick ; Hadrich Belguith, Lamia. / Arabic transliteration of Romanized Tunisian dialect text : A preliminary investigation. Computational Linguistics and Intelligent Text Processing - 16th International Conference, CICLing 2015, Proceedings. Springer-Verlag, 2015. pp. 608-619 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{fb4a70d94fa548418055c6beb767164c,
title = "Arabic transliteration of Romanized Tunisian dialect text: A preliminary investigation",
abstract = "In this paper, we describe the process of converting Tunisian Dialect text that is written in Latin script (also called Arabizi) into Arabic script following the CODA orthography convention for Dialectal Arabic. Our input consists of messages and comments taken from SMS, social networks and broadcast videos. The language used in social media and SMS messaging is characterized by the use of informal and non-standard vocabulary such as repeated letters for emphasis, typos, non-standard abbreviations, and nonlinguistic content, such as emoticons. There is a high degree of variation is spelling in Arabic dialects due to the lack of orthographic widely supported standards in both Arabic and Latin scripts. In the context of natural language processing, transliterating from Arabizi to Arabic script is a necessary step since most recently available tools for processing Arabic Dialects expect Arabic script input.",
keywords = "CODA, Corpus, Normalization, Transliteration, Tunisian Dialect",
author = "Abir Masmoudi and Nizar Habash and Mariem Ellouze and Yannick Est{\`e}ve and {Hadrich Belguith}, Lamia",
year = "2015",
month = "1",
day = "1",
doi = "10.1007/978-3-319-18111-0_46",
language = "English (US)",
isbn = "9783319181103",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer-Verlag",
pages = "608--619",
booktitle = "Computational Linguistics and Intelligent Text Processing - 16th International Conference, CICLing 2015, Proceedings",

}

TY - GEN

T1 - Arabic transliteration of Romanized Tunisian dialect text

T2 - A preliminary investigation

AU - Masmoudi, Abir

AU - Habash, Nizar

AU - Ellouze, Mariem

AU - Estève, Yannick

AU - Hadrich Belguith, Lamia

PY - 2015/1/1

Y1 - 2015/1/1

N2 - In this paper, we describe the process of converting Tunisian Dialect text that is written in Latin script (also called Arabizi) into Arabic script following the CODA orthography convention for Dialectal Arabic. Our input consists of messages and comments taken from SMS, social networks and broadcast videos. The language used in social media and SMS messaging is characterized by the use of informal and non-standard vocabulary such as repeated letters for emphasis, typos, non-standard abbreviations, and nonlinguistic content, such as emoticons. There is a high degree of variation is spelling in Arabic dialects due to the lack of orthographic widely supported standards in both Arabic and Latin scripts. In the context of natural language processing, transliterating from Arabizi to Arabic script is a necessary step since most recently available tools for processing Arabic Dialects expect Arabic script input.

AB - In this paper, we describe the process of converting Tunisian Dialect text that is written in Latin script (also called Arabizi) into Arabic script following the CODA orthography convention for Dialectal Arabic. Our input consists of messages and comments taken from SMS, social networks and broadcast videos. The language used in social media and SMS messaging is characterized by the use of informal and non-standard vocabulary such as repeated letters for emphasis, typos, non-standard abbreviations, and nonlinguistic content, such as emoticons. There is a high degree of variation is spelling in Arabic dialects due to the lack of orthographic widely supported standards in both Arabic and Latin scripts. In the context of natural language processing, transliterating from Arabizi to Arabic script is a necessary step since most recently available tools for processing Arabic Dialects expect Arabic script input.

KW - CODA

KW - Corpus

KW - Normalization

KW - Transliteration

KW - Tunisian Dialect

UR - http://www.scopus.com/inward/record.url?scp=84942569555&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84942569555&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-18111-0_46

DO - 10.1007/978-3-319-18111-0_46

M3 - Conference contribution

AN - SCOPUS:84942569555

SN - 9783319181103

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 608

EP - 619

BT - Computational Linguistics and Intelligent Text Processing - 16th International Conference, CICLing 2015, Proceedings

PB - Springer-Verlag

ER -