Text readability for Arabic as a foreign language

Hind Saddiki, Karim Bouzoubaa, Violetta Cavalli-Sforza

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    In this study, we evaluate the informativeness of lexical, morphological and semantic features in determining the readability of texts geared towards learners of Arabic as a foreign language. We have gathered low-complexity features with the purpose of establishing a baseline for future research in readability assessment, using freely available natural language processing (NLP) and machine learning (ML) tools on a publicly accessible corpus. We tested common classification algorithms, as well as random forests-an ensemble learning method-and report on their results using several evaluation measures for comparability with similar work. Our results suggest that a small set of easily computed features can be indicative of the reading level of a text. Moreover, our findings will serve as a common ground, for ourselves and others, to evaluate and compare the performance of more elaborate techniques and feature sets.

    Original languageEnglish (US)
    Title of host publication2015 IEEE/ACS 12th International Conference of Computer Systems and Applications, AICCSA 2015
    PublisherIEEE Computer Society
    Volume2016-July
    ISBN (Electronic)9781509004782
    DOIs
    StatePublished - Jul 7 2016
    Event12th IEEE/ACS International Conference of Computer Systems and Applications, AICCSA 2015 - Marrakech, Morocco
    Duration: Nov 17 2015Nov 20 2015

    Other

    Other12th IEEE/ACS International Conference of Computer Systems and Applications, AICCSA 2015
    CountryMorocco
    CityMarrakech
    Period11/17/1511/20/15

    Fingerprint

    Learning systems
    Semantics
    Processing

    Keywords

    • Arabic
    • foreign language learning
    • machine learning
    • natural language processing
    • text readability

    ASJC Scopus subject areas

    • Computer Networks and Communications
    • Computer Science Applications
    • Hardware and Architecture
    • Signal Processing
    • Control and Systems Engineering
    • Electrical and Electronic Engineering

    Cite this

    Saddiki, H., Bouzoubaa, K., & Cavalli-Sforza, V. (2016). Text readability for Arabic as a foreign language. In 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications, AICCSA 2015 (Vol. 2016-July). [7507232] IEEE Computer Society. https://doi.org/10.1109/AICCSA.2015.7507232

    Text readability for Arabic as a foreign language. / Saddiki, Hind; Bouzoubaa, Karim; Cavalli-Sforza, Violetta.

    2015 IEEE/ACS 12th International Conference of Computer Systems and Applications, AICCSA 2015. Vol. 2016-July IEEE Computer Society, 2016. 7507232.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Saddiki, H, Bouzoubaa, K & Cavalli-Sforza, V 2016, Text readability for Arabic as a foreign language. in 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications, AICCSA 2015. vol. 2016-July, 7507232, IEEE Computer Society, 12th IEEE/ACS International Conference of Computer Systems and Applications, AICCSA 2015, Marrakech, Morocco, 11/17/15. https://doi.org/10.1109/AICCSA.2015.7507232
    Saddiki H, Bouzoubaa K, Cavalli-Sforza V. Text readability for Arabic as a foreign language. In 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications, AICCSA 2015. Vol. 2016-July. IEEE Computer Society. 2016. 7507232 https://doi.org/10.1109/AICCSA.2015.7507232
    Saddiki, Hind ; Bouzoubaa, Karim ; Cavalli-Sforza, Violetta. / Text readability for Arabic as a foreign language. 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications, AICCSA 2015. Vol. 2016-July IEEE Computer Society, 2016.
    @inproceedings{0a0545a72cc249698fef217c36f80ee3,
    title = "Text readability for Arabic as a foreign language",
    abstract = "In this study, we evaluate the informativeness of lexical, morphological and semantic features in determining the readability of texts geared towards learners of Arabic as a foreign language. We have gathered low-complexity features with the purpose of establishing a baseline for future research in readability assessment, using freely available natural language processing (NLP) and machine learning (ML) tools on a publicly accessible corpus. We tested common classification algorithms, as well as random forests-an ensemble learning method-and report on their results using several evaluation measures for comparability with similar work. Our results suggest that a small set of easily computed features can be indicative of the reading level of a text. Moreover, our findings will serve as a common ground, for ourselves and others, to evaluate and compare the performance of more elaborate techniques and feature sets.",
    keywords = "Arabic, foreign language learning, machine learning, natural language processing, text readability",
    author = "Hind Saddiki and Karim Bouzoubaa and Violetta Cavalli-Sforza",
    year = "2016",
    month = "7",
    day = "7",
    doi = "10.1109/AICCSA.2015.7507232",
    language = "English (US)",
    volume = "2016-July",
    booktitle = "2015 IEEE/ACS 12th International Conference of Computer Systems and Applications, AICCSA 2015",
    publisher = "IEEE Computer Society",

    }

    TY - GEN

    T1 - Text readability for Arabic as a foreign language

    AU - Saddiki, Hind

    AU - Bouzoubaa, Karim

    AU - Cavalli-Sforza, Violetta

    PY - 2016/7/7

    Y1 - 2016/7/7

    N2 - In this study, we evaluate the informativeness of lexical, morphological and semantic features in determining the readability of texts geared towards learners of Arabic as a foreign language. We have gathered low-complexity features with the purpose of establishing a baseline for future research in readability assessment, using freely available natural language processing (NLP) and machine learning (ML) tools on a publicly accessible corpus. We tested common classification algorithms, as well as random forests-an ensemble learning method-and report on their results using several evaluation measures for comparability with similar work. Our results suggest that a small set of easily computed features can be indicative of the reading level of a text. Moreover, our findings will serve as a common ground, for ourselves and others, to evaluate and compare the performance of more elaborate techniques and feature sets.

    AB - In this study, we evaluate the informativeness of lexical, morphological and semantic features in determining the readability of texts geared towards learners of Arabic as a foreign language. We have gathered low-complexity features with the purpose of establishing a baseline for future research in readability assessment, using freely available natural language processing (NLP) and machine learning (ML) tools on a publicly accessible corpus. We tested common classification algorithms, as well as random forests-an ensemble learning method-and report on their results using several evaluation measures for comparability with similar work. Our results suggest that a small set of easily computed features can be indicative of the reading level of a text. Moreover, our findings will serve as a common ground, for ourselves and others, to evaluate and compare the performance of more elaborate techniques and feature sets.

    KW - Arabic

    KW - foreign language learning

    KW - machine learning

    KW - natural language processing

    KW - text readability

    UR - http://www.scopus.com/inward/record.url?scp=84980395642&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84980395642&partnerID=8YFLogxK

    U2 - 10.1109/AICCSA.2015.7507232

    DO - 10.1109/AICCSA.2015.7507232

    M3 - Conference contribution

    VL - 2016-July

    BT - 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications, AICCSA 2015

    PB - IEEE Computer Society

    ER -