Improving Arabic-to-English Statistical Machine Translation by reordering post-verbal subjects for alignment

Marine Carpuat, Yuval Marton, Nizar Habash

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We study the challenges raised by Arabic verb and subject detection and reordering in Statistical Machine Translation (SMT).We show that post-verbal subject (VS) constructions are hard to translate because they have highly ambiguous reordering patterns when translated to English. In addition, implementing reordering is difficult because the boundaries of VS constructions are hard to detect accurately, even with a state-of-the-art Arabic dependency parser. We therefore propose to reorder VS constructions into SV order for SMT word alignment only. This strategy significantly improves BLEU and TER scores, even on a strong large-scale baseline and despite noisy parses.

Original languageEnglish (US)
Title of host publicationACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
Pages178-183
Number of pages6
StatePublished - Dec 1 2010
Event48th Annual Meeting of the Association for Computational Linguistics, ACL 2010 - Uppsala, Sweden
Duration: Jul 11 2010Jul 16 2010

Other

Other48th Annual Meeting of the Association for Computational Linguistics, ACL 2010
CountrySweden
CityUppsala
Period7/11/107/16/10

Fingerprint

Statistical Machine Translation
Alignment
Arabic Verb

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Cite this

Carpuat, M., Marton, Y., & Habash, N. (2010). Improving Arabic-to-English Statistical Machine Translation by reordering post-verbal subjects for alignment. In ACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (pp. 178-183)

Improving Arabic-to-English Statistical Machine Translation by reordering post-verbal subjects for alignment. / Carpuat, Marine; Marton, Yuval; Habash, Nizar.

ACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference. 2010. p. 178-183.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Carpuat, M, Marton, Y & Habash, N 2010, Improving Arabic-to-English Statistical Machine Translation by reordering post-verbal subjects for alignment. in ACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference. pp. 178-183, 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, Uppsala, Sweden, 7/11/10.
Carpuat M, Marton Y, Habash N. Improving Arabic-to-English Statistical Machine Translation by reordering post-verbal subjects for alignment. In ACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference. 2010. p. 178-183
Carpuat, Marine ; Marton, Yuval ; Habash, Nizar. / Improving Arabic-to-English Statistical Machine Translation by reordering post-verbal subjects for alignment. ACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference. 2010. pp. 178-183
@inproceedings{36a65e3e682c4252a18b2f7dab53a015,
title = "Improving Arabic-to-English Statistical Machine Translation by reordering post-verbal subjects for alignment",
abstract = "We study the challenges raised by Arabic verb and subject detection and reordering in Statistical Machine Translation (SMT).We show that post-verbal subject (VS) constructions are hard to translate because they have highly ambiguous reordering patterns when translated to English. In addition, implementing reordering is difficult because the boundaries of VS constructions are hard to detect accurately, even with a state-of-the-art Arabic dependency parser. We therefore propose to reorder VS constructions into SV order for SMT word alignment only. This strategy significantly improves BLEU and TER scores, even on a strong large-scale baseline and despite noisy parses.",
author = "Marine Carpuat and Yuval Marton and Nizar Habash",
year = "2010",
month = "12",
day = "1",
language = "English (US)",
isbn = "9781617388088",
pages = "178--183",
booktitle = "ACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference",

}

TY - GEN

T1 - Improving Arabic-to-English Statistical Machine Translation by reordering post-verbal subjects for alignment

AU - Carpuat, Marine

AU - Marton, Yuval

AU - Habash, Nizar

PY - 2010/12/1

Y1 - 2010/12/1

N2 - We study the challenges raised by Arabic verb and subject detection and reordering in Statistical Machine Translation (SMT).We show that post-verbal subject (VS) constructions are hard to translate because they have highly ambiguous reordering patterns when translated to English. In addition, implementing reordering is difficult because the boundaries of VS constructions are hard to detect accurately, even with a state-of-the-art Arabic dependency parser. We therefore propose to reorder VS constructions into SV order for SMT word alignment only. This strategy significantly improves BLEU and TER scores, even on a strong large-scale baseline and despite noisy parses.

AB - We study the challenges raised by Arabic verb and subject detection and reordering in Statistical Machine Translation (SMT).We show that post-verbal subject (VS) constructions are hard to translate because they have highly ambiguous reordering patterns when translated to English. In addition, implementing reordering is difficult because the boundaries of VS constructions are hard to detect accurately, even with a state-of-the-art Arabic dependency parser. We therefore propose to reorder VS constructions into SV order for SMT word alignment only. This strategy significantly improves BLEU and TER scores, even on a strong large-scale baseline and despite noisy parses.

UR - http://www.scopus.com/inward/record.url?scp=84859945878&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84859945878&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84859945878

SN - 9781617388088

SP - 178

EP - 183

BT - ACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference

ER -