Text readability for Arabic as a foreign language

Hind Saddiki, Karim Bouzoubaa, Violetta Cavalli-Sforza

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this study, we evaluate the informativeness of lexical, morphological and semantic features in determining the readability of texts geared towards learners of Arabic as a foreign language. We have gathered low-complexity features with the purpose of establishing a baseline for future research in readability assessment, using freely available natural language processing (NLP) and machine learning (ML) tools on a publicly accessible corpus. We tested common classification algorithms, as well as random forests-an ensemble learning method-and report on their results using several evaluation measures for comparability with similar work. Our results suggest that a small set of easily computed features can be indicative of the reading level of a text. Moreover, our findings will serve as a common ground, for ourselves and others, to evaluate and compare the performance of more elaborate techniques and feature sets.

Original languageEnglish (US)
Title of host publication2015 IEEE/ACS 12th International Conference of Computer Systems and Applications, AICCSA 2015
PublisherIEEE Computer Society
Volume2016-July
ISBN (Electronic)9781509004782
DOIs
StatePublished - Jul 7 2016
Event12th IEEE/ACS International Conference of Computer Systems and Applications, AICCSA 2015 - Marrakech, Morocco
Duration: Nov 17 2015Nov 20 2015

Other

Other12th IEEE/ACS International Conference of Computer Systems and Applications, AICCSA 2015
CountryMorocco
CityMarrakech
Period11/17/1511/20/15

Fingerprint

Learning systems
Semantics
Processing

Keywords

  • Arabic
  • foreign language learning
  • machine learning
  • natural language processing
  • text readability

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture
  • Signal Processing
  • Control and Systems Engineering
  • Electrical and Electronic Engineering

Cite this

Saddiki, H., Bouzoubaa, K., & Cavalli-Sforza, V. (2016). Text readability for Arabic as a foreign language. In 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications, AICCSA 2015 (Vol. 2016-July). [7507232] IEEE Computer Society. https://doi.org/10.1109/AICCSA.2015.7507232

Text readability for Arabic as a foreign language. / Saddiki, Hind; Bouzoubaa, Karim; Cavalli-Sforza, Violetta.

2015 IEEE/ACS 12th International Conference of Computer Systems and Applications, AICCSA 2015. Vol. 2016-July IEEE Computer Society, 2016. 7507232.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Saddiki, H, Bouzoubaa, K & Cavalli-Sforza, V 2016, Text readability for Arabic as a foreign language. in 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications, AICCSA 2015. vol. 2016-July, 7507232, IEEE Computer Society, 12th IEEE/ACS International Conference of Computer Systems and Applications, AICCSA 2015, Marrakech, Morocco, 11/17/15. https://doi.org/10.1109/AICCSA.2015.7507232
Saddiki H, Bouzoubaa K, Cavalli-Sforza V. Text readability for Arabic as a foreign language. In 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications, AICCSA 2015. Vol. 2016-July. IEEE Computer Society. 2016. 7507232 https://doi.org/10.1109/AICCSA.2015.7507232
Saddiki, Hind ; Bouzoubaa, Karim ; Cavalli-Sforza, Violetta. / Text readability for Arabic as a foreign language. 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications, AICCSA 2015. Vol. 2016-July IEEE Computer Society, 2016.
@inproceedings{0a0545a72cc249698fef217c36f80ee3,
title = "Text readability for Arabic as a foreign language",
abstract = "In this study, we evaluate the informativeness of lexical, morphological and semantic features in determining the readability of texts geared towards learners of Arabic as a foreign language. We have gathered low-complexity features with the purpose of establishing a baseline for future research in readability assessment, using freely available natural language processing (NLP) and machine learning (ML) tools on a publicly accessible corpus. We tested common classification algorithms, as well as random forests-an ensemble learning method-and report on their results using several evaluation measures for comparability with similar work. Our results suggest that a small set of easily computed features can be indicative of the reading level of a text. Moreover, our findings will serve as a common ground, for ourselves and others, to evaluate and compare the performance of more elaborate techniques and feature sets.",
keywords = "Arabic, foreign language learning, machine learning, natural language processing, text readability",
author = "Hind Saddiki and Karim Bouzoubaa and Violetta Cavalli-Sforza",
year = "2016",
month = "7",
day = "7",
doi = "10.1109/AICCSA.2015.7507232",
language = "English (US)",
volume = "2016-July",
booktitle = "2015 IEEE/ACS 12th International Conference of Computer Systems and Applications, AICCSA 2015",
publisher = "IEEE Computer Society",

}

TY - GEN

T1 - Text readability for Arabic as a foreign language

AU - Saddiki, Hind

AU - Bouzoubaa, Karim

AU - Cavalli-Sforza, Violetta

PY - 2016/7/7

Y1 - 2016/7/7

N2 - In this study, we evaluate the informativeness of lexical, morphological and semantic features in determining the readability of texts geared towards learners of Arabic as a foreign language. We have gathered low-complexity features with the purpose of establishing a baseline for future research in readability assessment, using freely available natural language processing (NLP) and machine learning (ML) tools on a publicly accessible corpus. We tested common classification algorithms, as well as random forests-an ensemble learning method-and report on their results using several evaluation measures for comparability with similar work. Our results suggest that a small set of easily computed features can be indicative of the reading level of a text. Moreover, our findings will serve as a common ground, for ourselves and others, to evaluate and compare the performance of more elaborate techniques and feature sets.

AB - In this study, we evaluate the informativeness of lexical, morphological and semantic features in determining the readability of texts geared towards learners of Arabic as a foreign language. We have gathered low-complexity features with the purpose of establishing a baseline for future research in readability assessment, using freely available natural language processing (NLP) and machine learning (ML) tools on a publicly accessible corpus. We tested common classification algorithms, as well as random forests-an ensemble learning method-and report on their results using several evaluation measures for comparability with similar work. Our results suggest that a small set of easily computed features can be indicative of the reading level of a text. Moreover, our findings will serve as a common ground, for ourselves and others, to evaluate and compare the performance of more elaborate techniques and feature sets.

KW - Arabic

KW - foreign language learning

KW - machine learning

KW - natural language processing

KW - text readability

UR - http://www.scopus.com/inward/record.url?scp=84980395642&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84980395642&partnerID=8YFLogxK

U2 - 10.1109/AICCSA.2015.7507232

DO - 10.1109/AICCSA.2015.7507232

M3 - Conference contribution

VL - 2016-July

BT - 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications, AICCSA 2015

PB - IEEE Computer Society

ER -