DUSTer: A method for unraveling cross-language divergences for statistical word-level alignment

Bonnie J. Dorr, Lisa Pearl, Rebecca Hwa, Nizar Habash

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The frequent occurrence of divergences—structural differences between languages—presents a great challenge for statistical wordlevel alignment. In this paper, we introduce DUSTer, a method for systematically identifying common divergence types and transforming an English sentence structure to bear a closer resemblance to that of another language. Our ultimate goal is to enable more accurate alignment and projection of dependency trees in another language without requiring any training on dependency-tree data in that language. We present an empirical analysis comparing the complexities of performing word-level alignments with and without divergence handling. Our results suggest that our approach facilitates word-level alignment, particularly for sentence pairs containing divergences.

Original languageEnglish (US)
Title of host publicationMachine Translation
Subtitle of host publicationFrom Research to Real Users - 5th Conference of the Association for Machine Translation in the Americas, AMTA 2002, Proceedings
EditorsStephen D. Richardson
PublisherSpringer Verlag
Pages31-43
Number of pages13
ISBN (Print)3540442820, 9783540442820
StatePublished - Jan 1 2002
Event5th Conference of the Association for Machine Translation in the Americas, AMTA 2002 - Tiburon, United States
Duration: Oct 8 2002Oct 12 2002

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2499
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other5th Conference of the Association for Machine Translation in the Americas, AMTA 2002
CountryUnited States
CityTiburon
Period10/8/0210/12/02

    Fingerprint

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Dorr, B. J., Pearl, L., Hwa, R., & Habash, N. (2002). DUSTer: A method for unraveling cross-language divergences for statistical word-level alignment. In S. D. Richardson (Ed.), Machine Translation: From Research to Real Users - 5th Conference of the Association for Machine Translation in the Americas, AMTA 2002, Proceedings (pp. 31-43). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2499). Springer Verlag.