Processing spontaneous orthography

Ramy Eskander, Nizar Habash, Owen Rambow, Nadi Tomeh

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    In cases in which there is no standard orthography for a language or language variant, written texts will display a variety of orthographic choices. This is problematic for natural language processing (NLP) because it creates spurious data sparseness. We study the transformation of spontaneously spelled Egyptian Arabic into a conventionalized orthography which we have previously proposed for NLP purposes. We show that a two-stage process can reduce divergences from this standard by 69%, making subsequent processing of Egyptian Arabic easier.

    Original languageEnglish (US)
    Title of host publicationNAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics
    Subtitle of host publicationHuman Language Technologies, Proceedings of the Main Conference
    PublisherAssociation for Computational Linguistics (ACL)
    Pages585-595
    Number of pages11
    ISBN (Electronic)9781937284473
    StatePublished - Jan 1 2013
    Event2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2013 - Atlanta, United States
    Duration: Jun 9 2013Jun 14 2013

    Other

    Other2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2013
    CountryUnited States
    CityAtlanta
    Period6/9/136/14/13

    Fingerprint

    orthography
    Processing
    language
    divergence
    Orthography
    Egyptians
    Natural Language Processing
    Language
    Divergence
    Orthographic

    ASJC Scopus subject areas

    • Language and Linguistics
    • Computer Science Applications
    • Linguistics and Language

    Cite this

    Eskander, R., Habash, N., Rambow, O., & Tomeh, N. (2013). Processing spontaneous orthography. In NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference (pp. 585-595). Association for Computational Linguistics (ACL).

    Processing spontaneous orthography. / Eskander, Ramy; Habash, Nizar; Rambow, Owen; Tomeh, Nadi.

    NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference. Association for Computational Linguistics (ACL), 2013. p. 585-595.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Eskander, R, Habash, N, Rambow, O & Tomeh, N 2013, Processing spontaneous orthography. in NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference. Association for Computational Linguistics (ACL), pp. 585-595, 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2013, Atlanta, United States, 6/9/13.
    Eskander R, Habash N, Rambow O, Tomeh N. Processing spontaneous orthography. In NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference. Association for Computational Linguistics (ACL). 2013. p. 585-595
    Eskander, Ramy ; Habash, Nizar ; Rambow, Owen ; Tomeh, Nadi. / Processing spontaneous orthography. NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference. Association for Computational Linguistics (ACL), 2013. pp. 585-595
    @inproceedings{e51150d93452414bafcb1a6fab4cfb3d,
    title = "Processing spontaneous orthography",
    abstract = "In cases in which there is no standard orthography for a language or language variant, written texts will display a variety of orthographic choices. This is problematic for natural language processing (NLP) because it creates spurious data sparseness. We study the transformation of spontaneously spelled Egyptian Arabic into a conventionalized orthography which we have previously proposed for NLP purposes. We show that a two-stage process can reduce divergences from this standard by 69{\%}, making subsequent processing of Egyptian Arabic easier.",
    author = "Ramy Eskander and Nizar Habash and Owen Rambow and Nadi Tomeh",
    year = "2013",
    month = "1",
    day = "1",
    language = "English (US)",
    pages = "585--595",
    booktitle = "NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics",
    publisher = "Association for Computational Linguistics (ACL)",

    }

    TY - GEN

    T1 - Processing spontaneous orthography

    AU - Eskander, Ramy

    AU - Habash, Nizar

    AU - Rambow, Owen

    AU - Tomeh, Nadi

    PY - 2013/1/1

    Y1 - 2013/1/1

    N2 - In cases in which there is no standard orthography for a language or language variant, written texts will display a variety of orthographic choices. This is problematic for natural language processing (NLP) because it creates spurious data sparseness. We study the transformation of spontaneously spelled Egyptian Arabic into a conventionalized orthography which we have previously proposed for NLP purposes. We show that a two-stage process can reduce divergences from this standard by 69%, making subsequent processing of Egyptian Arabic easier.

    AB - In cases in which there is no standard orthography for a language or language variant, written texts will display a variety of orthographic choices. This is problematic for natural language processing (NLP) because it creates spurious data sparseness. We study the transformation of spontaneously spelled Egyptian Arabic into a conventionalized orthography which we have previously proposed for NLP purposes. We show that a two-stage process can reduce divergences from this standard by 69%, making subsequent processing of Egyptian Arabic easier.

    UR - http://www.scopus.com/inward/record.url?scp=84926175165&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84926175165&partnerID=8YFLogxK

    M3 - Conference contribution

    SP - 585

    EP - 595

    BT - NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics

    PB - Association for Computational Linguistics (ACL)

    ER -