DUSTer

A method for unraveling cross-language divergences for statistical word-level alignment

Bonnie J. Dorr, Lisa Pearl, Rebecca Hwa, Nizar Habash

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    The frequent occurrence of divergences—structural differences between languages—presents a great challenge for statistical wordlevel alignment. In this paper, we introduce DUSTer, a method for systematically identifying common divergence types and transforming an English sentence structure to bear a closer resemblance to that of another language. Our ultimate goal is to enable more accurate alignment and projection of dependency trees in another language without requiring any training on dependency-tree data in that language. We present an empirical analysis comparing the complexities of performing word-level alignments with and without divergence handling. Our results suggest that our approach facilitates word-level alignment, particularly for sentence pairs containing divergences.

    Original languageEnglish (US)
    Title of host publicationMachine Translation
    Subtitle of host publicationFrom Research to Real Users - 5th Conference of the Association for Machine Translation in the Americas, AMTA 2002, Proceedings
    PublisherSpringer-Verlag
    Pages31-43
    Number of pages13
    ISBN (Print)3540442820, 9783540442820
    StatePublished - Jan 1 2002
    Event5th Conference of the Association for Machine Translation in the Americas, AMTA 2002 - Tiburon, United States
    Duration: Oct 8 2002Oct 12 2002

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume2499
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Other

    Other5th Conference of the Association for Machine Translation in the Americas, AMTA 2002
    CountryUnited States
    CityTiburon
    Period10/8/0210/12/02

    Fingerprint

    Divergence
    Alignment
    Empirical Analysis
    Projection
    Language

    ASJC Scopus subject areas

    • Theoretical Computer Science
    • Computer Science(all)

    Cite this

    Dorr, B. J., Pearl, L., Hwa, R., & Habash, N. (2002). DUSTer: A method for unraveling cross-language divergences for statistical word-level alignment. In Machine Translation: From Research to Real Users - 5th Conference of the Association for Machine Translation in the Americas, AMTA 2002, Proceedings (pp. 31-43). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2499). Springer-Verlag.

    DUSTer : A method for unraveling cross-language divergences for statistical word-level alignment. / Dorr, Bonnie J.; Pearl, Lisa; Hwa, Rebecca; Habash, Nizar.

    Machine Translation: From Research to Real Users - 5th Conference of the Association for Machine Translation in the Americas, AMTA 2002, Proceedings. Springer-Verlag, 2002. p. 31-43 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2499).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Dorr, BJ, Pearl, L, Hwa, R & Habash, N 2002, DUSTer: A method for unraveling cross-language divergences for statistical word-level alignment. in Machine Translation: From Research to Real Users - 5th Conference of the Association for Machine Translation in the Americas, AMTA 2002, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 2499, Springer-Verlag, pp. 31-43, 5th Conference of the Association for Machine Translation in the Americas, AMTA 2002, Tiburon, United States, 10/8/02.
    Dorr BJ, Pearl L, Hwa R, Habash N. DUSTer: A method for unraveling cross-language divergences for statistical word-level alignment. In Machine Translation: From Research to Real Users - 5th Conference of the Association for Machine Translation in the Americas, AMTA 2002, Proceedings. Springer-Verlag. 2002. p. 31-43. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
    Dorr, Bonnie J. ; Pearl, Lisa ; Hwa, Rebecca ; Habash, Nizar. / DUSTer : A method for unraveling cross-language divergences for statistical word-level alignment. Machine Translation: From Research to Real Users - 5th Conference of the Association for Machine Translation in the Americas, AMTA 2002, Proceedings. Springer-Verlag, 2002. pp. 31-43 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
    @inproceedings{0da5356e0c8a47d3ac1e035476fe78fb,
    title = "DUSTer: A method for unraveling cross-language divergences for statistical word-level alignment",
    abstract = "The frequent occurrence of divergences—structural differences between languages—presents a great challenge for statistical wordlevel alignment. In this paper, we introduce DUSTer, a method for systematically identifying common divergence types and transforming an English sentence structure to bear a closer resemblance to that of another language. Our ultimate goal is to enable more accurate alignment and projection of dependency trees in another language without requiring any training on dependency-tree data in that language. We present an empirical analysis comparing the complexities of performing word-level alignments with and without divergence handling. Our results suggest that our approach facilitates word-level alignment, particularly for sentence pairs containing divergences.",
    author = "Dorr, {Bonnie J.} and Lisa Pearl and Rebecca Hwa and Nizar Habash",
    year = "2002",
    month = "1",
    day = "1",
    language = "English (US)",
    isbn = "3540442820",
    series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
    publisher = "Springer-Verlag",
    pages = "31--43",
    booktitle = "Machine Translation",

    }

    TY - GEN

    T1 - DUSTer

    T2 - A method for unraveling cross-language divergences for statistical word-level alignment

    AU - Dorr, Bonnie J.

    AU - Pearl, Lisa

    AU - Hwa, Rebecca

    AU - Habash, Nizar

    PY - 2002/1/1

    Y1 - 2002/1/1

    N2 - The frequent occurrence of divergences—structural differences between languages—presents a great challenge for statistical wordlevel alignment. In this paper, we introduce DUSTer, a method for systematically identifying common divergence types and transforming an English sentence structure to bear a closer resemblance to that of another language. Our ultimate goal is to enable more accurate alignment and projection of dependency trees in another language without requiring any training on dependency-tree data in that language. We present an empirical analysis comparing the complexities of performing word-level alignments with and without divergence handling. Our results suggest that our approach facilitates word-level alignment, particularly for sentence pairs containing divergences.

    AB - The frequent occurrence of divergences—structural differences between languages—presents a great challenge for statistical wordlevel alignment. In this paper, we introduce DUSTer, a method for systematically identifying common divergence types and transforming an English sentence structure to bear a closer resemblance to that of another language. Our ultimate goal is to enable more accurate alignment and projection of dependency trees in another language without requiring any training on dependency-tree data in that language. We present an empirical analysis comparing the complexities of performing word-level alignments with and without divergence handling. Our results suggest that our approach facilitates word-level alignment, particularly for sentence pairs containing divergences.

    UR - http://www.scopus.com/inward/record.url?scp=35048852688&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=35048852688&partnerID=8YFLogxK

    M3 - Conference contribution

    SN - 3540442820

    SN - 9783540442820

    T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

    SP - 31

    EP - 43

    BT - Machine Translation

    PB - Springer-Verlag

    ER -