Dialectal Arabic to english machine translation: Pivoting through modern standard Arabic

Wael Salloum, Nizar Habash

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Modern Standard Arabic (MSA) has a wealth of natural language processing (NLP) tools and resources. In comparison, resources for dialectal Arabic (DA), the unstandardized spoken varieties of Arabic, are still lacking. We present ELISSA, a machine translation (MT) system for DA to MSA. ELISSA employs a rule-based approach that relies on morphological analysis, transfer rules and dictionaries in addition to language models to produce MSA paraphrases of DA sentences. ELISSA can be employed as a general preprocessor for DA when using MSA NLP tools. A manual error analysis of ELISSA's output shows that it produces correct MSA translations over 93% of the time. Using ELISSA to produce MSA versions of DA sentences as part of an MSA-pivoting DA-to-English MT solution, improves BLEU scores on multiple blind test sets between 0.6% and 1.4%.

Original languageEnglish (US)
Title of host publicationNAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics
Subtitle of host publicationHuman Language Technologies, Proceedings of the Main Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages348-358
Number of pages11
ISBN (Electronic)9781937284473
StatePublished - Jan 1 2013
Event2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2013 - Atlanta, United States
Duration: Jun 9 2013Jun 14 2013

Other

Other2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2013
CountryUnited States
CityAtlanta
Period6/9/136/14/13

Fingerprint

language
Glossaries
Processing
Machine Translation
resources
dictionary
Error analysis
present
Resources
Natural Language Processing
time
Wealth
Machine Translation System
Morphological Analysis
Dictionary
Blind Test
Error Analysis
Paraphrase
Language Model
Arabic Translation

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Science Applications
  • Linguistics and Language

Cite this

Salloum, W., & Habash, N. (2013). Dialectal Arabic to english machine translation: Pivoting through modern standard Arabic. In NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference (pp. 348-358). Association for Computational Linguistics (ACL).

Dialectal Arabic to english machine translation : Pivoting through modern standard Arabic. / Salloum, Wael; Habash, Nizar.

NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference. Association for Computational Linguistics (ACL), 2013. p. 348-358.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Salloum, W & Habash, N 2013, Dialectal Arabic to english machine translation: Pivoting through modern standard Arabic. in NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference. Association for Computational Linguistics (ACL), pp. 348-358, 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2013, Atlanta, United States, 6/9/13.
Salloum W, Habash N. Dialectal Arabic to english machine translation: Pivoting through modern standard Arabic. In NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference. Association for Computational Linguistics (ACL). 2013. p. 348-358
Salloum, Wael ; Habash, Nizar. / Dialectal Arabic to english machine translation : Pivoting through modern standard Arabic. NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference. Association for Computational Linguistics (ACL), 2013. pp. 348-358
@inproceedings{5b3eca5763b94aa4b91f79acc09a2884,
title = "Dialectal Arabic to english machine translation: Pivoting through modern standard Arabic",
abstract = "Modern Standard Arabic (MSA) has a wealth of natural language processing (NLP) tools and resources. In comparison, resources for dialectal Arabic (DA), the unstandardized spoken varieties of Arabic, are still lacking. We present ELISSA, a machine translation (MT) system for DA to MSA. ELISSA employs a rule-based approach that relies on morphological analysis, transfer rules and dictionaries in addition to language models to produce MSA paraphrases of DA sentences. ELISSA can be employed as a general preprocessor for DA when using MSA NLP tools. A manual error analysis of ELISSA's output shows that it produces correct MSA translations over 93{\%} of the time. Using ELISSA to produce MSA versions of DA sentences as part of an MSA-pivoting DA-to-English MT solution, improves BLEU scores on multiple blind test sets between 0.6{\%} and 1.4{\%}.",
author = "Wael Salloum and Nizar Habash",
year = "2013",
month = "1",
day = "1",
language = "English (US)",
pages = "348--358",
booktitle = "NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics",
publisher = "Association for Computational Linguistics (ACL)",

}

TY - GEN

T1 - Dialectal Arabic to english machine translation

T2 - Pivoting through modern standard Arabic

AU - Salloum, Wael

AU - Habash, Nizar

PY - 2013/1/1

Y1 - 2013/1/1

N2 - Modern Standard Arabic (MSA) has a wealth of natural language processing (NLP) tools and resources. In comparison, resources for dialectal Arabic (DA), the unstandardized spoken varieties of Arabic, are still lacking. We present ELISSA, a machine translation (MT) system for DA to MSA. ELISSA employs a rule-based approach that relies on morphological analysis, transfer rules and dictionaries in addition to language models to produce MSA paraphrases of DA sentences. ELISSA can be employed as a general preprocessor for DA when using MSA NLP tools. A manual error analysis of ELISSA's output shows that it produces correct MSA translations over 93% of the time. Using ELISSA to produce MSA versions of DA sentences as part of an MSA-pivoting DA-to-English MT solution, improves BLEU scores on multiple blind test sets between 0.6% and 1.4%.

AB - Modern Standard Arabic (MSA) has a wealth of natural language processing (NLP) tools and resources. In comparison, resources for dialectal Arabic (DA), the unstandardized spoken varieties of Arabic, are still lacking. We present ELISSA, a machine translation (MT) system for DA to MSA. ELISSA employs a rule-based approach that relies on morphological analysis, transfer rules and dictionaries in addition to language models to produce MSA paraphrases of DA sentences. ELISSA can be employed as a general preprocessor for DA when using MSA NLP tools. A manual error analysis of ELISSA's output shows that it produces correct MSA translations over 93% of the time. Using ELISSA to produce MSA versions of DA sentences as part of an MSA-pivoting DA-to-English MT solution, improves BLEU scores on multiple blind test sets between 0.6% and 1.4%.

UR - http://www.scopus.com/inward/record.url?scp=84926137816&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84926137816&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84926137816

SP - 348

EP - 358

BT - NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics

PB - Association for Computational Linguistics (ACL)

ER -