Arabic transliteration of Romanized Tunisian dialect text

A preliminary investigation

Abir Masmoudi, Nizar Habash, Mariem Ellouze, Yannick Estève, Lamia Hadrich Belguith

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    In this paper, we describe the process of converting Tunisian Dialect text that is written in Latin script (also called Arabizi) into Arabic script following the CODA orthography convention for Dialectal Arabic. Our input consists of messages and comments taken from SMS, social networks and broadcast videos. The language used in social media and SMS messaging is characterized by the use of informal and non-standard vocabulary such as repeated letters for emphasis, typos, non-standard abbreviations, and nonlinguistic content, such as emoticons. There is a high degree of variation is spelling in Arabic dialects due to the lack of orthographic widely supported standards in both Arabic and Latin scripts. In the context of natural language processing, transliterating from Arabizi to Arabic script is a necessary step since most recently available tools for processing Arabic Dialects expect Arabic script input.

    Original languageEnglish (US)
    Title of host publicationComputational Linguistics and Intelligent Text Processing - 16th International Conference, CICLing 2015, Proceedings
    PublisherSpringer-Verlag
    Pages608-619
    Number of pages12
    ISBN (Print)9783319181103
    DOIs
    StatePublished - Jan 1 2015
    Event16th Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2015 - Cairo, Egypt
    Duration: Apr 14 2015Apr 20 2015

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume9041
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Other

    Other16th Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2015
    CountryEgypt
    CityCairo
    Period4/14/154/20/15

    Fingerprint

    Abbreviation
    Social Media
    Processing
    Broadcast
    Natural Language
    Social Networks
    Necessary
    Text
    Context
    Language
    Standards

    Keywords

    • CODA
    • Corpus
    • Normalization
    • Transliteration
    • Tunisian Dialect

    ASJC Scopus subject areas

    • Theoretical Computer Science
    • Computer Science(all)

    Cite this

    Masmoudi, A., Habash, N., Ellouze, M., Estève, Y., & Hadrich Belguith, L. (2015). Arabic transliteration of Romanized Tunisian dialect text: A preliminary investigation. In Computational Linguistics and Intelligent Text Processing - 16th International Conference, CICLing 2015, Proceedings (pp. 608-619). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9041). Springer-Verlag. https://doi.org/10.1007/978-3-319-18111-0_46

    Arabic transliteration of Romanized Tunisian dialect text : A preliminary investigation. / Masmoudi, Abir; Habash, Nizar; Ellouze, Mariem; Estève, Yannick; Hadrich Belguith, Lamia.

    Computational Linguistics and Intelligent Text Processing - 16th International Conference, CICLing 2015, Proceedings. Springer-Verlag, 2015. p. 608-619 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9041).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Masmoudi, A, Habash, N, Ellouze, M, Estève, Y & Hadrich Belguith, L 2015, Arabic transliteration of Romanized Tunisian dialect text: A preliminary investigation. in Computational Linguistics and Intelligent Text Processing - 16th International Conference, CICLing 2015, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9041, Springer-Verlag, pp. 608-619, 16th Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2015, Cairo, Egypt, 4/14/15. https://doi.org/10.1007/978-3-319-18111-0_46
    Masmoudi A, Habash N, Ellouze M, Estève Y, Hadrich Belguith L. Arabic transliteration of Romanized Tunisian dialect text: A preliminary investigation. In Computational Linguistics and Intelligent Text Processing - 16th International Conference, CICLing 2015, Proceedings. Springer-Verlag. 2015. p. 608-619. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-18111-0_46
    Masmoudi, Abir ; Habash, Nizar ; Ellouze, Mariem ; Estève, Yannick ; Hadrich Belguith, Lamia. / Arabic transliteration of Romanized Tunisian dialect text : A preliminary investigation. Computational Linguistics and Intelligent Text Processing - 16th International Conference, CICLing 2015, Proceedings. Springer-Verlag, 2015. pp. 608-619 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
    @inproceedings{fb4a70d94fa548418055c6beb767164c,
    title = "Arabic transliteration of Romanized Tunisian dialect text: A preliminary investigation",
    abstract = "In this paper, we describe the process of converting Tunisian Dialect text that is written in Latin script (also called Arabizi) into Arabic script following the CODA orthography convention for Dialectal Arabic. Our input consists of messages and comments taken from SMS, social networks and broadcast videos. The language used in social media and SMS messaging is characterized by the use of informal and non-standard vocabulary such as repeated letters for emphasis, typos, non-standard abbreviations, and nonlinguistic content, such as emoticons. There is a high degree of variation is spelling in Arabic dialects due to the lack of orthographic widely supported standards in both Arabic and Latin scripts. In the context of natural language processing, transliterating from Arabizi to Arabic script is a necessary step since most recently available tools for processing Arabic Dialects expect Arabic script input.",
    keywords = "CODA, Corpus, Normalization, Transliteration, Tunisian Dialect",
    author = "Abir Masmoudi and Nizar Habash and Mariem Ellouze and Yannick Est{\`e}ve and {Hadrich Belguith}, Lamia",
    year = "2015",
    month = "1",
    day = "1",
    doi = "10.1007/978-3-319-18111-0_46",
    language = "English (US)",
    isbn = "9783319181103",
    series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
    publisher = "Springer-Verlag",
    pages = "608--619",
    booktitle = "Computational Linguistics and Intelligent Text Processing - 16th International Conference, CICLing 2015, Proceedings",

    }

    TY - GEN

    T1 - Arabic transliteration of Romanized Tunisian dialect text

    T2 - A preliminary investigation

    AU - Masmoudi, Abir

    AU - Habash, Nizar

    AU - Ellouze, Mariem

    AU - Estève, Yannick

    AU - Hadrich Belguith, Lamia

    PY - 2015/1/1

    Y1 - 2015/1/1

    N2 - In this paper, we describe the process of converting Tunisian Dialect text that is written in Latin script (also called Arabizi) into Arabic script following the CODA orthography convention for Dialectal Arabic. Our input consists of messages and comments taken from SMS, social networks and broadcast videos. The language used in social media and SMS messaging is characterized by the use of informal and non-standard vocabulary such as repeated letters for emphasis, typos, non-standard abbreviations, and nonlinguistic content, such as emoticons. There is a high degree of variation is spelling in Arabic dialects due to the lack of orthographic widely supported standards in both Arabic and Latin scripts. In the context of natural language processing, transliterating from Arabizi to Arabic script is a necessary step since most recently available tools for processing Arabic Dialects expect Arabic script input.

    AB - In this paper, we describe the process of converting Tunisian Dialect text that is written in Latin script (also called Arabizi) into Arabic script following the CODA orthography convention for Dialectal Arabic. Our input consists of messages and comments taken from SMS, social networks and broadcast videos. The language used in social media and SMS messaging is characterized by the use of informal and non-standard vocabulary such as repeated letters for emphasis, typos, non-standard abbreviations, and nonlinguistic content, such as emoticons. There is a high degree of variation is spelling in Arabic dialects due to the lack of orthographic widely supported standards in both Arabic and Latin scripts. In the context of natural language processing, transliterating from Arabizi to Arabic script is a necessary step since most recently available tools for processing Arabic Dialects expect Arabic script input.

    KW - CODA

    KW - Corpus

    KW - Normalization

    KW - Transliteration

    KW - Tunisian Dialect

    UR - http://www.scopus.com/inward/record.url?scp=84942569555&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84942569555&partnerID=8YFLogxK

    U2 - 10.1007/978-3-319-18111-0_46

    DO - 10.1007/978-3-319-18111-0_46

    M3 - Conference contribution

    SN - 9783319181103

    T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

    SP - 608

    EP - 619

    BT - Computational Linguistics and Intelligent Text Processing - 16th International Conference, CICLing 2015, Proceedings

    PB - Springer-Verlag

    ER -