A sentiment treebank and morphologically enriched recursive deep models for effective sentiment analysis in Arabic

Ramy Baly, Hazem Hajj, Nizar Habash, Khaled Bashir Shaban, Wassim El-Hajj

    Research output: Contribution to journalArticle

    Abstract

    Accurate sentiment analysis models encode the sentiment of words and their combinations to predict the overall sentiment of a sentence. This task becomes challenging when applied to morphologically rich languages (MRL). In this article, we evaluate the use of deep learning advances, namely the Recursive Neural Tensor Networks (RNTN), for sentiment analysis in Arabic as a case study of MRLs. While Arabic may not be considered the only representative of all MRLs, the challenges faced and proposed solutions in Arabic are common to many other MRLs. We identify, illustrate, and address MRL-related challenges and show how RNTN is affected by the morphological richness and orthographic ambiguity of the Arabic language. To address the challenges with sentiment extraction from text in MRL, we propose to explore different orthographic features as well as different morphological features at multiple levels of abstraction ranging from raw words to roots. A key requirement for RNTN is the availability of a sentiment treebank; a collection of syntactic parse trees annotated for sentiment at all levels of constituency and that currently only exists in English. Therefore, our contribution also includes the creation of the first Arabic Sentiment Treebank (ARSENTB) that is morphologically and orthographically enriched. Experimental results show that, compared to the basic RNTN proposed for English, our solution achieves significant improvements up to 8% absolute at the phrase level and 10.8% absolute at the sentence level, measured by average F1 score. It also outperforms well-known classifiers including Support Vector Machines, Recursive Auto Encoders, and Long Short-Term Memory by 7.6%, 3.2%, and 1.6% absolute respectively, all models being trained with similar morphological considerations.

    Original languageEnglish (US)
    Article number23
    JournalACM Transactions on Asian and Low-Resource Language Information Processing
    Volume16
    Issue number4
    DOIs
    StatePublished - Jul 1 2017

    Fingerprint

    Tensors
    Syntactics
    Support vector machines
    Classifiers
    Availability

    ASJC Scopus subject areas

    • Computer Science(all)

    Cite this

    A sentiment treebank and morphologically enriched recursive deep models for effective sentiment analysis in Arabic. / Baly, Ramy; Hajj, Hazem; Habash, Nizar; Shaban, Khaled Bashir; El-Hajj, Wassim.

    In: ACM Transactions on Asian and Low-Resource Language Information Processing, Vol. 16, No. 4, 23, 01.07.2017.

    Research output: Contribution to journalArticle

    @article{3948ad1087ba4577b7831cd3ff742e7a,
    title = "A sentiment treebank and morphologically enriched recursive deep models for effective sentiment analysis in Arabic",
    abstract = "Accurate sentiment analysis models encode the sentiment of words and their combinations to predict the overall sentiment of a sentence. This task becomes challenging when applied to morphologically rich languages (MRL). In this article, we evaluate the use of deep learning advances, namely the Recursive Neural Tensor Networks (RNTN), for sentiment analysis in Arabic as a case study of MRLs. While Arabic may not be considered the only representative of all MRLs, the challenges faced and proposed solutions in Arabic are common to many other MRLs. We identify, illustrate, and address MRL-related challenges and show how RNTN is affected by the morphological richness and orthographic ambiguity of the Arabic language. To address the challenges with sentiment extraction from text in MRL, we propose to explore different orthographic features as well as different morphological features at multiple levels of abstraction ranging from raw words to roots. A key requirement for RNTN is the availability of a sentiment treebank; a collection of syntactic parse trees annotated for sentiment at all levels of constituency and that currently only exists in English. Therefore, our contribution also includes the creation of the first Arabic Sentiment Treebank (ARSENTB) that is morphologically and orthographically enriched. Experimental results show that, compared to the basic RNTN proposed for English, our solution achieves significant improvements up to 8{\%} absolute at the phrase level and 10.8{\%} absolute at the sentence level, measured by average F1 score. It also outperforms well-known classifiers including Support Vector Machines, Recursive Auto Encoders, and Long Short-Term Memory by 7.6{\%}, 3.2{\%}, and 1.6{\%} absolute respectively, all models being trained with similar morphological considerations.",
    author = "Ramy Baly and Hazem Hajj and Nizar Habash and Shaban, {Khaled Bashir} and Wassim El-Hajj",
    year = "2017",
    month = "7",
    day = "1",
    doi = "10.1145/3086576",
    language = "English (US)",
    volume = "16",
    journal = "ACM Transactions on Asian and Low-Resource Language Information Processing",
    issn = "2375-4699",
    publisher = "Association for Computing Machinery (ACM)",
    number = "4",

    }

    TY - JOUR

    T1 - A sentiment treebank and morphologically enriched recursive deep models for effective sentiment analysis in Arabic

    AU - Baly, Ramy

    AU - Hajj, Hazem

    AU - Habash, Nizar

    AU - Shaban, Khaled Bashir

    AU - El-Hajj, Wassim

    PY - 2017/7/1

    Y1 - 2017/7/1

    N2 - Accurate sentiment analysis models encode the sentiment of words and their combinations to predict the overall sentiment of a sentence. This task becomes challenging when applied to morphologically rich languages (MRL). In this article, we evaluate the use of deep learning advances, namely the Recursive Neural Tensor Networks (RNTN), for sentiment analysis in Arabic as a case study of MRLs. While Arabic may not be considered the only representative of all MRLs, the challenges faced and proposed solutions in Arabic are common to many other MRLs. We identify, illustrate, and address MRL-related challenges and show how RNTN is affected by the morphological richness and orthographic ambiguity of the Arabic language. To address the challenges with sentiment extraction from text in MRL, we propose to explore different orthographic features as well as different morphological features at multiple levels of abstraction ranging from raw words to roots. A key requirement for RNTN is the availability of a sentiment treebank; a collection of syntactic parse trees annotated for sentiment at all levels of constituency and that currently only exists in English. Therefore, our contribution also includes the creation of the first Arabic Sentiment Treebank (ARSENTB) that is morphologically and orthographically enriched. Experimental results show that, compared to the basic RNTN proposed for English, our solution achieves significant improvements up to 8% absolute at the phrase level and 10.8% absolute at the sentence level, measured by average F1 score. It also outperforms well-known classifiers including Support Vector Machines, Recursive Auto Encoders, and Long Short-Term Memory by 7.6%, 3.2%, and 1.6% absolute respectively, all models being trained with similar morphological considerations.

    AB - Accurate sentiment analysis models encode the sentiment of words and their combinations to predict the overall sentiment of a sentence. This task becomes challenging when applied to morphologically rich languages (MRL). In this article, we evaluate the use of deep learning advances, namely the Recursive Neural Tensor Networks (RNTN), for sentiment analysis in Arabic as a case study of MRLs. While Arabic may not be considered the only representative of all MRLs, the challenges faced and proposed solutions in Arabic are common to many other MRLs. We identify, illustrate, and address MRL-related challenges and show how RNTN is affected by the morphological richness and orthographic ambiguity of the Arabic language. To address the challenges with sentiment extraction from text in MRL, we propose to explore different orthographic features as well as different morphological features at multiple levels of abstraction ranging from raw words to roots. A key requirement for RNTN is the availability of a sentiment treebank; a collection of syntactic parse trees annotated for sentiment at all levels of constituency and that currently only exists in English. Therefore, our contribution also includes the creation of the first Arabic Sentiment Treebank (ARSENTB) that is morphologically and orthographically enriched. Experimental results show that, compared to the basic RNTN proposed for English, our solution achieves significant improvements up to 8% absolute at the phrase level and 10.8% absolute at the sentence level, measured by average F1 score. It also outperforms well-known classifiers including Support Vector Machines, Recursive Auto Encoders, and Long Short-Term Memory by 7.6%, 3.2%, and 1.6% absolute respectively, all models being trained with similar morphological considerations.

    UR - http://www.scopus.com/inward/record.url?scp=85026660253&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=85026660253&partnerID=8YFLogxK

    U2 - 10.1145/3086576

    DO - 10.1145/3086576

    M3 - Article

    VL - 16

    JO - ACM Transactions on Asian and Low-Resource Language Information Processing

    JF - ACM Transactions on Asian and Low-Resource Language Information Processing

    SN - 2375-4699

    IS - 4

    M1 - 23

    ER -