Tharwa: A large scale dialectal Arabic - Standard Arabic - English lexicon

Mona Diab, Mohamed Al-Badrashiny, Maryam Aminian, Mohammed Attia, Pradeep Dasigi, Heba Elfardy, Ramy Eskander, Nizar Habash, Abdelati Hawwari, Wael Salloum

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We introduce an electronic three-way lexicon, Tharwa, comprising Dialectal Arabic, Modern Standard Arabic and English correspondents. The paper focuses on Egyptian Arabic as the first pilot dialect for the resource, with plans to expand to other dialects of Arabic in later phases of the project. We describe Tharwa's creation process and report on its current status. The lexical entries are augmented with various elements of linguistic information such as POS, gender, rationality, number, and root and pattern information. The lexicon is based on a compilation of information from both monolingual and bilingual existing resources such as paper dictionaries and electronic, corpus-based dictionaries. Multiple levels of quality checks are performed on the output of each step in the creation process. The importance of this lexicon lies in the fact that it is the first resource of its kind bridging multiple variants of Arabic with English. Furthermore, it is a wide coverage lexical resource containing over 73, 000 Egyptian entries. Tharwa is publicly available. We believe it will have a significant impact on both Theoretical Linguistics as well as Computational Linguistics research.

Original languageEnglish (US)
Title of host publicationProceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014
PublisherEuropean Language Resources Association (ELRA)
Pages3782-3789
Number of pages8
ISBN (Electronic)9782951740884
StatePublished - Jan 1 2014
Event9th International Conference on Language Resources and Evaluation, LREC 2014 - Reykjavik, Iceland
Duration: May 26 2014May 31 2014

Other

Other9th International Conference on Language Resources and Evaluation, LREC 2014
CountryIceland
CityReykjavik
Period5/26/145/31/14

Fingerprint

linguistics
dialect
resources
dictionary
electronics
computational linguistics
rationality
coverage
Resources
Lexicon
English Lexicon
gender
Dictionary
Egyptians
Computational Linguistics
Theoretical Linguistics
Corpus-based
Compilation
Lexical Entries
Rationality

Keywords

  • Arabic dialects
  • Arabic lexicon
  • Arabic morphology
  • Egyptian Arabic dictionary

ASJC Scopus subject areas

  • Linguistics and Language
  • Library and Information Sciences
  • Education
  • Language and Linguistics

Cite this

Diab, M., Al-Badrashiny, M., Aminian, M., Attia, M., Dasigi, P., Elfardy, H., ... Salloum, W. (2014). Tharwa: A large scale dialectal Arabic - Standard Arabic - English lexicon. In Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014 (pp. 3782-3789). European Language Resources Association (ELRA).

Tharwa : A large scale dialectal Arabic - Standard Arabic - English lexicon. / Diab, Mona; Al-Badrashiny, Mohamed; Aminian, Maryam; Attia, Mohammed; Dasigi, Pradeep; Elfardy, Heba; Eskander, Ramy; Habash, Nizar; Hawwari, Abdelati; Salloum, Wael.

Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014. European Language Resources Association (ELRA), 2014. p. 3782-3789.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Diab, M, Al-Badrashiny, M, Aminian, M, Attia, M, Dasigi, P, Elfardy, H, Eskander, R, Habash, N, Hawwari, A & Salloum, W 2014, Tharwa: A large scale dialectal Arabic - Standard Arabic - English lexicon. in Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014. European Language Resources Association (ELRA), pp. 3782-3789, 9th International Conference on Language Resources and Evaluation, LREC 2014, Reykjavik, Iceland, 5/26/14.
Diab M, Al-Badrashiny M, Aminian M, Attia M, Dasigi P, Elfardy H et al. Tharwa: A large scale dialectal Arabic - Standard Arabic - English lexicon. In Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014. European Language Resources Association (ELRA). 2014. p. 3782-3789
Diab, Mona ; Al-Badrashiny, Mohamed ; Aminian, Maryam ; Attia, Mohammed ; Dasigi, Pradeep ; Elfardy, Heba ; Eskander, Ramy ; Habash, Nizar ; Hawwari, Abdelati ; Salloum, Wael. / Tharwa : A large scale dialectal Arabic - Standard Arabic - English lexicon. Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014. European Language Resources Association (ELRA), 2014. pp. 3782-3789
@inproceedings{70c2a1d64c994673ba71e095a7400971,
title = "Tharwa: A large scale dialectal Arabic - Standard Arabic - English lexicon",
abstract = "We introduce an electronic three-way lexicon, Tharwa, comprising Dialectal Arabic, Modern Standard Arabic and English correspondents. The paper focuses on Egyptian Arabic as the first pilot dialect for the resource, with plans to expand to other dialects of Arabic in later phases of the project. We describe Tharwa's creation process and report on its current status. The lexical entries are augmented with various elements of linguistic information such as POS, gender, rationality, number, and root and pattern information. The lexicon is based on a compilation of information from both monolingual and bilingual existing resources such as paper dictionaries and electronic, corpus-based dictionaries. Multiple levels of quality checks are performed on the output of each step in the creation process. The importance of this lexicon lies in the fact that it is the first resource of its kind bridging multiple variants of Arabic with English. Furthermore, it is a wide coverage lexical resource containing over 73, 000 Egyptian entries. Tharwa is publicly available. We believe it will have a significant impact on both Theoretical Linguistics as well as Computational Linguistics research.",
keywords = "Arabic dialects, Arabic lexicon, Arabic morphology, Egyptian Arabic dictionary",
author = "Mona Diab and Mohamed Al-Badrashiny and Maryam Aminian and Mohammed Attia and Pradeep Dasigi and Heba Elfardy and Ramy Eskander and Nizar Habash and Abdelati Hawwari and Wael Salloum",
year = "2014",
month = "1",
day = "1",
language = "English (US)",
pages = "3782--3789",
booktitle = "Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014",
publisher = "European Language Resources Association (ELRA)",

}

TY - GEN

T1 - Tharwa

T2 - A large scale dialectal Arabic - Standard Arabic - English lexicon

AU - Diab, Mona

AU - Al-Badrashiny, Mohamed

AU - Aminian, Maryam

AU - Attia, Mohammed

AU - Dasigi, Pradeep

AU - Elfardy, Heba

AU - Eskander, Ramy

AU - Habash, Nizar

AU - Hawwari, Abdelati

AU - Salloum, Wael

PY - 2014/1/1

Y1 - 2014/1/1

N2 - We introduce an electronic three-way lexicon, Tharwa, comprising Dialectal Arabic, Modern Standard Arabic and English correspondents. The paper focuses on Egyptian Arabic as the first pilot dialect for the resource, with plans to expand to other dialects of Arabic in later phases of the project. We describe Tharwa's creation process and report on its current status. The lexical entries are augmented with various elements of linguistic information such as POS, gender, rationality, number, and root and pattern information. The lexicon is based on a compilation of information from both monolingual and bilingual existing resources such as paper dictionaries and electronic, corpus-based dictionaries. Multiple levels of quality checks are performed on the output of each step in the creation process. The importance of this lexicon lies in the fact that it is the first resource of its kind bridging multiple variants of Arabic with English. Furthermore, it is a wide coverage lexical resource containing over 73, 000 Egyptian entries. Tharwa is publicly available. We believe it will have a significant impact on both Theoretical Linguistics as well as Computational Linguistics research.

AB - We introduce an electronic three-way lexicon, Tharwa, comprising Dialectal Arabic, Modern Standard Arabic and English correspondents. The paper focuses on Egyptian Arabic as the first pilot dialect for the resource, with plans to expand to other dialects of Arabic in later phases of the project. We describe Tharwa's creation process and report on its current status. The lexical entries are augmented with various elements of linguistic information such as POS, gender, rationality, number, and root and pattern information. The lexicon is based on a compilation of information from both monolingual and bilingual existing resources such as paper dictionaries and electronic, corpus-based dictionaries. Multiple levels of quality checks are performed on the output of each step in the creation process. The importance of this lexicon lies in the fact that it is the first resource of its kind bridging multiple variants of Arabic with English. Furthermore, it is a wide coverage lexical resource containing over 73, 000 Egyptian entries. Tharwa is publicly available. We believe it will have a significant impact on both Theoretical Linguistics as well as Computational Linguistics research.

KW - Arabic dialects

KW - Arabic lexicon

KW - Arabic morphology

KW - Egyptian Arabic dictionary

UR - http://www.scopus.com/inward/record.url?scp=85026887071&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85026887071&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85026887071

SP - 3782

EP - 3789

BT - Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014

PB - European Language Resources Association (ELRA)

ER -