Automatic learning of morphological variations for handling Out-of-Vocabulary terms in Urdu-English machine translation

Nizar Habash, Hayden Metsky

    Research output: Contribution to conferencePaper

    Abstract

    We present an approach for online handling of Out-of-Vocabulary (OOV) terms in Urdu-English MT. Since Urdu is morphologically richer than English, we expect a large portion of the OOV terms to be Urdu morphological variations that are irrelevant to English. We describe an approach to automatically learn English-irrelevant (targetirrelevant) Urdu (source) morphological variation rules from standard phrase tables. These rules are learned in an unsupervised (or lightly supervised) manner by exploiting redundancy in Urdu and collocation with English translations. We use these rules to hypothesize invocabulary alternatives to the OOV terms. Our results show that we reduce the OOV rate from a standard baseline average of 2.6% to an average of 0.3% (or 89% relative decrease). We also increase the BLEU score by 0.45 (absolute) and 2.8%(relative) on a standard test set. A manual error analysis shows that 28% of handled OOV cases produce acceptable translations in context.

    Original languageEnglish (US)
    StatePublished - Dec 1 2008
    Event8th Biennial Conference of the Association for Machine Translation in the Americas, AMTA 2008 - Waikiki, HI, United States
    Duration: Oct 21 2008Oct 25 2008

    Other

    Other8th Biennial Conference of the Association for Machine Translation in the Americas, AMTA 2008
    CountryUnited States
    CityWaikiki, HI
    Period10/21/0810/25/08

    Fingerprint

    Error analysis
    Redundancy
    Urdu
    Morphological Variation
    Vocabulary
    Machine Translation
    English Translation
    Collocation
    Error Analysis

    ASJC Scopus subject areas

    • Language and Linguistics
    • Human-Computer Interaction
    • Software

    Cite this

    Habash, N., & Metsky, H. (2008). Automatic learning of morphological variations for handling Out-of-Vocabulary terms in Urdu-English machine translation. Paper presented at 8th Biennial Conference of the Association for Machine Translation in the Americas, AMTA 2008, Waikiki, HI, United States.

    Automatic learning of morphological variations for handling Out-of-Vocabulary terms in Urdu-English machine translation. / Habash, Nizar; Metsky, Hayden.

    2008. Paper presented at 8th Biennial Conference of the Association for Machine Translation in the Americas, AMTA 2008, Waikiki, HI, United States.

    Research output: Contribution to conferencePaper

    Habash, N & Metsky, H 2008, 'Automatic learning of morphological variations for handling Out-of-Vocabulary terms in Urdu-English machine translation' Paper presented at 8th Biennial Conference of the Association for Machine Translation in the Americas, AMTA 2008, Waikiki, HI, United States, 10/21/08 - 10/25/08, .
    Habash N, Metsky H. Automatic learning of morphological variations for handling Out-of-Vocabulary terms in Urdu-English machine translation. 2008. Paper presented at 8th Biennial Conference of the Association for Machine Translation in the Americas, AMTA 2008, Waikiki, HI, United States.
    Habash, Nizar ; Metsky, Hayden. / Automatic learning of morphological variations for handling Out-of-Vocabulary terms in Urdu-English machine translation. Paper presented at 8th Biennial Conference of the Association for Machine Translation in the Americas, AMTA 2008, Waikiki, HI, United States.
    @conference{0aa448ea00694c158422fe7897c3fcbf,
    title = "Automatic learning of morphological variations for handling Out-of-Vocabulary terms in Urdu-English machine translation",
    abstract = "We present an approach for online handling of Out-of-Vocabulary (OOV) terms in Urdu-English MT. Since Urdu is morphologically richer than English, we expect a large portion of the OOV terms to be Urdu morphological variations that are irrelevant to English. We describe an approach to automatically learn English-irrelevant (targetirrelevant) Urdu (source) morphological variation rules from standard phrase tables. These rules are learned in an unsupervised (or lightly supervised) manner by exploiting redundancy in Urdu and collocation with English translations. We use these rules to hypothesize invocabulary alternatives to the OOV terms. Our results show that we reduce the OOV rate from a standard baseline average of 2.6{\%} to an average of 0.3{\%} (or 89{\%} relative decrease). We also increase the BLEU score by 0.45 (absolute) and 2.8{\%}(relative) on a standard test set. A manual error analysis shows that 28{\%} of handled OOV cases produce acceptable translations in context.",
    author = "Nizar Habash and Hayden Metsky",
    year = "2008",
    month = "12",
    day = "1",
    language = "English (US)",
    note = "8th Biennial Conference of the Association for Machine Translation in the Americas, AMTA 2008 ; Conference date: 21-10-2008 Through 25-10-2008",

    }

    TY - CONF

    T1 - Automatic learning of morphological variations for handling Out-of-Vocabulary terms in Urdu-English machine translation

    AU - Habash, Nizar

    AU - Metsky, Hayden

    PY - 2008/12/1

    Y1 - 2008/12/1

    N2 - We present an approach for online handling of Out-of-Vocabulary (OOV) terms in Urdu-English MT. Since Urdu is morphologically richer than English, we expect a large portion of the OOV terms to be Urdu morphological variations that are irrelevant to English. We describe an approach to automatically learn English-irrelevant (targetirrelevant) Urdu (source) morphological variation rules from standard phrase tables. These rules are learned in an unsupervised (or lightly supervised) manner by exploiting redundancy in Urdu and collocation with English translations. We use these rules to hypothesize invocabulary alternatives to the OOV terms. Our results show that we reduce the OOV rate from a standard baseline average of 2.6% to an average of 0.3% (or 89% relative decrease). We also increase the BLEU score by 0.45 (absolute) and 2.8%(relative) on a standard test set. A manual error analysis shows that 28% of handled OOV cases produce acceptable translations in context.

    AB - We present an approach for online handling of Out-of-Vocabulary (OOV) terms in Urdu-English MT. Since Urdu is morphologically richer than English, we expect a large portion of the OOV terms to be Urdu morphological variations that are irrelevant to English. We describe an approach to automatically learn English-irrelevant (targetirrelevant) Urdu (source) morphological variation rules from standard phrase tables. These rules are learned in an unsupervised (or lightly supervised) manner by exploiting redundancy in Urdu and collocation with English translations. We use these rules to hypothesize invocabulary alternatives to the OOV terms. Our results show that we reduce the OOV rate from a standard baseline average of 2.6% to an average of 0.3% (or 89% relative decrease). We also increase the BLEU score by 0.45 (absolute) and 2.8%(relative) on a standard test set. A manual error analysis shows that 28% of handled OOV cases produce acceptable translations in context.

    UR - http://www.scopus.com/inward/record.url?scp=84858054361&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84858054361&partnerID=8YFLogxK

    M3 - Paper

    ER -