Symbolic-to-statistical hybridization: Extending generation-heavy machine translation

Nizar Habash, Bonnie Dorr, Christof Monz

Research output: Contribution to journalArticle

Abstract

The last few years have witnessed an increasing interest in hybridizing surface-based statistical approaches and rule-based symbolic approaches to machine translation (MT). Much of that work is focused on extending statistical MT systems with symbolic knowledge and components. In the brand of hybridization discussed here, we go in the opposite direction: adding statistical bilingual components to a symbolic system. Our base system is Generation-heavy machine translation (GHMT), a primarily symbolic asymmetrical approach that addresses the issue of Interlingual MT resource poverty in source-poor/target-rich language pairs by exploiting symbolic and statistical target-language resources. GHMT's statistical components are limited to target-language models, which arguably makes it a simple form of a hybrid system. We extend the hybrid nature of GHMT by adding statistical bilingual components. We also describe the details of retargeting it to Arabic-English MT. The morphological richness of Arabic brings several challenges to the hybridization task. We conduct an extensive evaluation of multiple system variants. Our evaluation shows that this new variant of GHMT-a primarily symbolic system extended with monolingual and bilingual statistical components-has a higher degree of grammaticality than a phrase-based statistical MT system, where grammaticality is measured in terms of correct verb-argument realization and long-distance dependency translation.

Original languageEnglish (US)
Pages (from-to)23-63
Number of pages41
JournalMachine Translation
Volume23
Issue number1
DOIs
StatePublished - Feb 1 2009

Fingerprint

Hybrid systems
language
Hybridization
Machine Translation
evaluation
resources
poverty

Keywords

  • Arabic-English machine translation
  • Generation-heavy machine translation
  • Hybrid machine translation
  • Statistical machine translation

ASJC Scopus subject areas

  • Software
  • Language and Linguistics
  • Linguistics and Language
  • Artificial Intelligence

Cite this

Symbolic-to-statistical hybridization : Extending generation-heavy machine translation. / Habash, Nizar; Dorr, Bonnie; Monz, Christof.

In: Machine Translation, Vol. 23, No. 1, 01.02.2009, p. 23-63.

Research output: Contribution to journalArticle

@article{7bd2e06c04af4bb2b888502ff08fbe91,
title = "Symbolic-to-statistical hybridization: Extending generation-heavy machine translation",
abstract = "The last few years have witnessed an increasing interest in hybridizing surface-based statistical approaches and rule-based symbolic approaches to machine translation (MT). Much of that work is focused on extending statistical MT systems with symbolic knowledge and components. In the brand of hybridization discussed here, we go in the opposite direction: adding statistical bilingual components to a symbolic system. Our base system is Generation-heavy machine translation (GHMT), a primarily symbolic asymmetrical approach that addresses the issue of Interlingual MT resource poverty in source-poor/target-rich language pairs by exploiting symbolic and statistical target-language resources. GHMT's statistical components are limited to target-language models, which arguably makes it a simple form of a hybrid system. We extend the hybrid nature of GHMT by adding statistical bilingual components. We also describe the details of retargeting it to Arabic-English MT. The morphological richness of Arabic brings several challenges to the hybridization task. We conduct an extensive evaluation of multiple system variants. Our evaluation shows that this new variant of GHMT-a primarily symbolic system extended with monolingual and bilingual statistical components-has a higher degree of grammaticality than a phrase-based statistical MT system, where grammaticality is measured in terms of correct verb-argument realization and long-distance dependency translation.",
keywords = "Arabic-English machine translation, Generation-heavy machine translation, Hybrid machine translation, Statistical machine translation",
author = "Nizar Habash and Bonnie Dorr and Christof Monz",
year = "2009",
month = "2",
day = "1",
doi = "10.1007/s10590-009-9056-7",
language = "English (US)",
volume = "23",
pages = "23--63",
journal = "Machine Translation",
issn = "0922-6567",
publisher = "Springer Netherlands",
number = "1",

}

TY - JOUR

T1 - Symbolic-to-statistical hybridization

T2 - Extending generation-heavy machine translation

AU - Habash, Nizar

AU - Dorr, Bonnie

AU - Monz, Christof

PY - 2009/2/1

Y1 - 2009/2/1

N2 - The last few years have witnessed an increasing interest in hybridizing surface-based statistical approaches and rule-based symbolic approaches to machine translation (MT). Much of that work is focused on extending statistical MT systems with symbolic knowledge and components. In the brand of hybridization discussed here, we go in the opposite direction: adding statistical bilingual components to a symbolic system. Our base system is Generation-heavy machine translation (GHMT), a primarily symbolic asymmetrical approach that addresses the issue of Interlingual MT resource poverty in source-poor/target-rich language pairs by exploiting symbolic and statistical target-language resources. GHMT's statistical components are limited to target-language models, which arguably makes it a simple form of a hybrid system. We extend the hybrid nature of GHMT by adding statistical bilingual components. We also describe the details of retargeting it to Arabic-English MT. The morphological richness of Arabic brings several challenges to the hybridization task. We conduct an extensive evaluation of multiple system variants. Our evaluation shows that this new variant of GHMT-a primarily symbolic system extended with monolingual and bilingual statistical components-has a higher degree of grammaticality than a phrase-based statistical MT system, where grammaticality is measured in terms of correct verb-argument realization and long-distance dependency translation.

AB - The last few years have witnessed an increasing interest in hybridizing surface-based statistical approaches and rule-based symbolic approaches to machine translation (MT). Much of that work is focused on extending statistical MT systems with symbolic knowledge and components. In the brand of hybridization discussed here, we go in the opposite direction: adding statistical bilingual components to a symbolic system. Our base system is Generation-heavy machine translation (GHMT), a primarily symbolic asymmetrical approach that addresses the issue of Interlingual MT resource poverty in source-poor/target-rich language pairs by exploiting symbolic and statistical target-language resources. GHMT's statistical components are limited to target-language models, which arguably makes it a simple form of a hybrid system. We extend the hybrid nature of GHMT by adding statistical bilingual components. We also describe the details of retargeting it to Arabic-English MT. The morphological richness of Arabic brings several challenges to the hybridization task. We conduct an extensive evaluation of multiple system variants. Our evaluation shows that this new variant of GHMT-a primarily symbolic system extended with monolingual and bilingual statistical components-has a higher degree of grammaticality than a phrase-based statistical MT system, where grammaticality is measured in terms of correct verb-argument realization and long-distance dependency translation.

KW - Arabic-English machine translation

KW - Generation-heavy machine translation

KW - Hybrid machine translation

KW - Statistical machine translation

UR - http://www.scopus.com/inward/record.url?scp=77149159452&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77149159452&partnerID=8YFLogxK

U2 - 10.1007/s10590-009-9056-7

DO - 10.1007/s10590-009-9056-7

M3 - Article

AN - SCOPUS:77149159452

VL - 23

SP - 23

EP - 63

JO - Machine Translation

JF - Machine Translation

SN - 0922-6567

IS - 1

ER -