Lost and found in translation: The impact of machine translated results on translingual information retrieval

Kristen Parton, Nizar Habash, Kathleen McKeown

Research output: Contribution to conferencePaper

Abstract

In an ideal cross-lingual information retrieval (CLIR) system, a user query would generate a search over documents in a different language and the relevant results would be presented in the user's language. In practice, CLIR systems are typically evaluated by judging result relevance in the document language, to factor out the effects of translating the results using machine translation (MT). In this paper, we investigate the influence of four different approaches for integrating MT and CLIR on both retrieval accuracy and user judgment of relevancy. We create a corpus with relevance judgments for both human and machine translated results, and use it to quantify the effect that MT quality has on end-to-end relevance. We find that MT errors result in a 16-39% decrease in mean average precision over the ground truth system that uses human translations. MT errors also caused relevant sentences to appear irrelevant - 5-19% of sentences were relevant in human translation, but were judged irrelevant in MT. To counter this degradation, we present two hybrid retrieval models and two automatic MT post-editing techniques and show that these approaches substantially mitigate the errors and improve the end-to-end relevance.

Original languageEnglish (US)
StatePublished - Jan 1 2012
Event10th Conference of the Association for Machine Translation in the Americas, AMTA 2012 - San Diego, United States
Duration: Oct 28 2012Nov 1 2012

Other

Other10th Conference of the Association for Machine Translation in the Americas, AMTA 2012
CountryUnited States
CitySan Diego
Period10/28/1211/1/12

Fingerprint

Information retrieval
Information retrieval systems
Degradation
Machine Translation
Information Retrieval
Language

ASJC Scopus subject areas

  • Language and Linguistics
  • Software
  • Human-Computer Interaction

Cite this

Parton, K., Habash, N., & McKeown, K. (2012). Lost and found in translation: The impact of machine translated results on translingual information retrieval. Paper presented at 10th Conference of the Association for Machine Translation in the Americas, AMTA 2012, San Diego, United States.

Lost and found in translation : The impact of machine translated results on translingual information retrieval. / Parton, Kristen; Habash, Nizar; McKeown, Kathleen.

2012. Paper presented at 10th Conference of the Association for Machine Translation in the Americas, AMTA 2012, San Diego, United States.

Research output: Contribution to conferencePaper

Parton, K, Habash, N & McKeown, K 2012, 'Lost and found in translation: The impact of machine translated results on translingual information retrieval' Paper presented at 10th Conference of the Association for Machine Translation in the Americas, AMTA 2012, San Diego, United States, 10/28/12 - 11/1/12, .
Parton K, Habash N, McKeown K. Lost and found in translation: The impact of machine translated results on translingual information retrieval. 2012. Paper presented at 10th Conference of the Association for Machine Translation in the Americas, AMTA 2012, San Diego, United States.
Parton, Kristen ; Habash, Nizar ; McKeown, Kathleen. / Lost and found in translation : The impact of machine translated results on translingual information retrieval. Paper presented at 10th Conference of the Association for Machine Translation in the Americas, AMTA 2012, San Diego, United States.
@conference{e5c236f0c9b1457b927ce07b805755d7,
title = "Lost and found in translation: The impact of machine translated results on translingual information retrieval",
abstract = "In an ideal cross-lingual information retrieval (CLIR) system, a user query would generate a search over documents in a different language and the relevant results would be presented in the user's language. In practice, CLIR systems are typically evaluated by judging result relevance in the document language, to factor out the effects of translating the results using machine translation (MT). In this paper, we investigate the influence of four different approaches for integrating MT and CLIR on both retrieval accuracy and user judgment of relevancy. We create a corpus with relevance judgments for both human and machine translated results, and use it to quantify the effect that MT quality has on end-to-end relevance. We find that MT errors result in a 16-39{\%} decrease in mean average precision over the ground truth system that uses human translations. MT errors also caused relevant sentences to appear irrelevant - 5-19{\%} of sentences were relevant in human translation, but were judged irrelevant in MT. To counter this degradation, we present two hybrid retrieval models and two automatic MT post-editing techniques and show that these approaches substantially mitigate the errors and improve the end-to-end relevance.",
author = "Kristen Parton and Nizar Habash and Kathleen McKeown",
year = "2012",
month = "1",
day = "1",
language = "English (US)",
note = "10th Conference of the Association for Machine Translation in the Americas, AMTA 2012 ; Conference date: 28-10-2012 Through 01-11-2012",

}

TY - CONF

T1 - Lost and found in translation

T2 - The impact of machine translated results on translingual information retrieval

AU - Parton, Kristen

AU - Habash, Nizar

AU - McKeown, Kathleen

PY - 2012/1/1

Y1 - 2012/1/1

N2 - In an ideal cross-lingual information retrieval (CLIR) system, a user query would generate a search over documents in a different language and the relevant results would be presented in the user's language. In practice, CLIR systems are typically evaluated by judging result relevance in the document language, to factor out the effects of translating the results using machine translation (MT). In this paper, we investigate the influence of four different approaches for integrating MT and CLIR on both retrieval accuracy and user judgment of relevancy. We create a corpus with relevance judgments for both human and machine translated results, and use it to quantify the effect that MT quality has on end-to-end relevance. We find that MT errors result in a 16-39% decrease in mean average precision over the ground truth system that uses human translations. MT errors also caused relevant sentences to appear irrelevant - 5-19% of sentences were relevant in human translation, but were judged irrelevant in MT. To counter this degradation, we present two hybrid retrieval models and two automatic MT post-editing techniques and show that these approaches substantially mitigate the errors and improve the end-to-end relevance.

AB - In an ideal cross-lingual information retrieval (CLIR) system, a user query would generate a search over documents in a different language and the relevant results would be presented in the user's language. In practice, CLIR systems are typically evaluated by judging result relevance in the document language, to factor out the effects of translating the results using machine translation (MT). In this paper, we investigate the influence of four different approaches for integrating MT and CLIR on both retrieval accuracy and user judgment of relevancy. We create a corpus with relevance judgments for both human and machine translated results, and use it to quantify the effect that MT quality has on end-to-end relevance. We find that MT errors result in a 16-39% decrease in mean average precision over the ground truth system that uses human translations. MT errors also caused relevant sentences to appear irrelevant - 5-19% of sentences were relevant in human translation, but were judged irrelevant in MT. To counter this degradation, we present two hybrid retrieval models and two automatic MT post-editing techniques and show that these approaches substantially mitigate the errors and improve the end-to-end relevance.

UR - http://www.scopus.com/inward/record.url?scp=84992390112&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84992390112&partnerID=8YFLogxK

M3 - Paper

ER -