DUSTer: A method for unraveling cross-language divergences for statistical word-level alignment

Bonnie J. Dorr, Lisa Pearl, Rebecca Hwa, Nizar Habash

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The frequent occurrence of divergences—structural differences between languages—presents a great challenge for statistical wordlevel alignment. In this paper, we introduce DUSTer, a method for systematically identifying common divergence types and transforming an English sentence structure to bear a closer resemblance to that of another language. Our ultimate goal is to enable more accurate alignment and projection of dependency trees in another language without requiring any training on dependency-tree data in that language. We present an empirical analysis comparing the complexities of performing word-level alignments with and without divergence handling. Our results suggest that our approach facilitates word-level alignment, particularly for sentence pairs containing divergences.

Original languageEnglish (US)
Title of host publicationMachine Translation
Subtitle of host publicationFrom Research to Real Users - 5th Conference of the Association for Machine Translation in the Americas, AMTA 2002, Proceedings
PublisherSpringer-Verlag
Pages31-43
Number of pages13
ISBN (Print)3540442820, 9783540442820
StatePublished - Jan 1 2002
Event5th Conference of the Association for Machine Translation in the Americas, AMTA 2002 - Tiburon, United States
Duration: Oct 8 2002Oct 12 2002

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2499
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other5th Conference of the Association for Machine Translation in the Americas, AMTA 2002
CountryUnited States
CityTiburon
Period10/8/0210/12/02

Fingerprint

Divergence
Alignment
Empirical Analysis
Projection
Language

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Dorr, B. J., Pearl, L., Hwa, R., & Habash, N. (2002). DUSTer: A method for unraveling cross-language divergences for statistical word-level alignment. In Machine Translation: From Research to Real Users - 5th Conference of the Association for Machine Translation in the Americas, AMTA 2002, Proceedings (pp. 31-43). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2499). Springer-Verlag.

DUSTer : A method for unraveling cross-language divergences for statistical word-level alignment. / Dorr, Bonnie J.; Pearl, Lisa; Hwa, Rebecca; Habash, Nizar.

Machine Translation: From Research to Real Users - 5th Conference of the Association for Machine Translation in the Americas, AMTA 2002, Proceedings. Springer-Verlag, 2002. p. 31-43 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2499).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Dorr, BJ, Pearl, L, Hwa, R & Habash, N 2002, DUSTer: A method for unraveling cross-language divergences for statistical word-level alignment. in Machine Translation: From Research to Real Users - 5th Conference of the Association for Machine Translation in the Americas, AMTA 2002, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 2499, Springer-Verlag, pp. 31-43, 5th Conference of the Association for Machine Translation in the Americas, AMTA 2002, Tiburon, United States, 10/8/02.
Dorr BJ, Pearl L, Hwa R, Habash N. DUSTer: A method for unraveling cross-language divergences for statistical word-level alignment. In Machine Translation: From Research to Real Users - 5th Conference of the Association for Machine Translation in the Americas, AMTA 2002, Proceedings. Springer-Verlag. 2002. p. 31-43. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Dorr, Bonnie J. ; Pearl, Lisa ; Hwa, Rebecca ; Habash, Nizar. / DUSTer : A method for unraveling cross-language divergences for statistical word-level alignment. Machine Translation: From Research to Real Users - 5th Conference of the Association for Machine Translation in the Americas, AMTA 2002, Proceedings. Springer-Verlag, 2002. pp. 31-43 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{0da5356e0c8a47d3ac1e035476fe78fb,
title = "DUSTer: A method for unraveling cross-language divergences for statistical word-level alignment",
abstract = "The frequent occurrence of divergences—structural differences between languages—presents a great challenge for statistical wordlevel alignment. In this paper, we introduce DUSTer, a method for systematically identifying common divergence types and transforming an English sentence structure to bear a closer resemblance to that of another language. Our ultimate goal is to enable more accurate alignment and projection of dependency trees in another language without requiring any training on dependency-tree data in that language. We present an empirical analysis comparing the complexities of performing word-level alignments with and without divergence handling. Our results suggest that our approach facilitates word-level alignment, particularly for sentence pairs containing divergences.",
author = "Dorr, {Bonnie J.} and Lisa Pearl and Rebecca Hwa and Nizar Habash",
year = "2002",
month = "1",
day = "1",
language = "English (US)",
isbn = "3540442820",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer-Verlag",
pages = "31--43",
booktitle = "Machine Translation",

}

TY - GEN

T1 - DUSTer

T2 - A method for unraveling cross-language divergences for statistical word-level alignment

AU - Dorr, Bonnie J.

AU - Pearl, Lisa

AU - Hwa, Rebecca

AU - Habash, Nizar

PY - 2002/1/1

Y1 - 2002/1/1

N2 - The frequent occurrence of divergences—structural differences between languages—presents a great challenge for statistical wordlevel alignment. In this paper, we introduce DUSTer, a method for systematically identifying common divergence types and transforming an English sentence structure to bear a closer resemblance to that of another language. Our ultimate goal is to enable more accurate alignment and projection of dependency trees in another language without requiring any training on dependency-tree data in that language. We present an empirical analysis comparing the complexities of performing word-level alignments with and without divergence handling. Our results suggest that our approach facilitates word-level alignment, particularly for sentence pairs containing divergences.

AB - The frequent occurrence of divergences—structural differences between languages—presents a great challenge for statistical wordlevel alignment. In this paper, we introduce DUSTer, a method for systematically identifying common divergence types and transforming an English sentence structure to bear a closer resemblance to that of another language. Our ultimate goal is to enable more accurate alignment and projection of dependency trees in another language without requiring any training on dependency-tree data in that language. We present an empirical analysis comparing the complexities of performing word-level alignments with and without divergence handling. Our results suggest that our approach facilitates word-level alignment, particularly for sentence pairs containing divergences.

UR - http://www.scopus.com/inward/record.url?scp=35048852688&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=35048852688&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:35048852688

SN - 3540442820

SN - 9783540442820

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 31

EP - 43

BT - Machine Translation

PB - Springer-Verlag

ER -