Design, construction and validation of an Arabic-english conceptual interlingua for cross-lingual information retrieval

Nizar Habash, Clinton Mah, Sabiha Imran, Randy Calistri-Yeh, Páraic Sheridan

Research output: Contribution to conferencePaper

Abstract

This paper describes the issues involved in extending a trans-lingual lexicon, the TextWise Conceptual Interlingua (CI), with Arabic terms. The Conceptual Interlingua is based on the Princeton English WordNet (Fellbaum, 1998). It is a central component in the cross-lingual information retrieval (CLIR) system CINDOR (Conceptual INterlingua for DOcument Retrieval). Arabic has a rich morphological system combining templatic and affixational paradigms for both inflectional and derivational morphology. This rich morphology poses a major challenge to the design and building of the Arabic CI and also its validation. This is because the available resources for Arabic, whether manually constructed bilingual lexicons or lexicons automatically derived from bilingual parallel corpora, exist at different levels of morphological representation. We describe here the issues and decisions made in the design and construction of the Arabic-English CI using different types of manual and automatic resources. We also present the results of an extensive validation of the Arabic CI and briefly discuss the evaluation of its use for CLIR on the TREC Arabic Benchmark collection.

Original languageEnglish (US)
Pages107-112
Number of pages6
StatePublished - Jan 1 2006
Event5th International Conference on Language Resources and Evaluation, LREC 2006 - Genoa, Italy
Duration: May 22 2006May 28 2006

Other

Other5th International Conference on Language Resources and Evaluation, LREC 2006
CountryItaly
CityGenoa
Period5/22/065/28/06

Fingerprint

information retrieval
resources
paradigm
evaluation
Information Retrieval
Interlingua

ASJC Scopus subject areas

  • Education
  • Library and Information Sciences
  • Linguistics and Language
  • Language and Linguistics

Cite this

Habash, N., Mah, C., Imran, S., Calistri-Yeh, R., & Sheridan, P. (2006). Design, construction and validation of an Arabic-english conceptual interlingua for cross-lingual information retrieval. 107-112. Paper presented at 5th International Conference on Language Resources and Evaluation, LREC 2006, Genoa, Italy.

Design, construction and validation of an Arabic-english conceptual interlingua for cross-lingual information retrieval. / Habash, Nizar; Mah, Clinton; Imran, Sabiha; Calistri-Yeh, Randy; Sheridan, Páraic.

2006. 107-112 Paper presented at 5th International Conference on Language Resources and Evaluation, LREC 2006, Genoa, Italy.

Research output: Contribution to conferencePaper

Habash, N, Mah, C, Imran, S, Calistri-Yeh, R & Sheridan, P 2006, 'Design, construction and validation of an Arabic-english conceptual interlingua for cross-lingual information retrieval' Paper presented at 5th International Conference on Language Resources and Evaluation, LREC 2006, Genoa, Italy, 5/22/06 - 5/28/06, pp. 107-112.
Habash N, Mah C, Imran S, Calistri-Yeh R, Sheridan P. Design, construction and validation of an Arabic-english conceptual interlingua for cross-lingual information retrieval. 2006. Paper presented at 5th International Conference on Language Resources and Evaluation, LREC 2006, Genoa, Italy.
Habash, Nizar ; Mah, Clinton ; Imran, Sabiha ; Calistri-Yeh, Randy ; Sheridan, Páraic. / Design, construction and validation of an Arabic-english conceptual interlingua for cross-lingual information retrieval. Paper presented at 5th International Conference on Language Resources and Evaluation, LREC 2006, Genoa, Italy.6 p.
@conference{8f1ceced12ff434991905c350ed17c7c,
title = "Design, construction and validation of an Arabic-english conceptual interlingua for cross-lingual information retrieval",
abstract = "This paper describes the issues involved in extending a trans-lingual lexicon, the TextWise Conceptual Interlingua (CI), with Arabic terms. The Conceptual Interlingua is based on the Princeton English WordNet (Fellbaum, 1998). It is a central component in the cross-lingual information retrieval (CLIR) system CINDOR (Conceptual INterlingua for DOcument Retrieval). Arabic has a rich morphological system combining templatic and affixational paradigms for both inflectional and derivational morphology. This rich morphology poses a major challenge to the design and building of the Arabic CI and also its validation. This is because the available resources for Arabic, whether manually constructed bilingual lexicons or lexicons automatically derived from bilingual parallel corpora, exist at different levels of morphological representation. We describe here the issues and decisions made in the design and construction of the Arabic-English CI using different types of manual and automatic resources. We also present the results of an extensive validation of the Arabic CI and briefly discuss the evaluation of its use for CLIR on the TREC Arabic Benchmark collection.",
author = "Nizar Habash and Clinton Mah and Sabiha Imran and Randy Calistri-Yeh and P{\'a}raic Sheridan",
year = "2006",
month = "1",
day = "1",
language = "English (US)",
pages = "107--112",
note = "5th International Conference on Language Resources and Evaluation, LREC 2006 ; Conference date: 22-05-2006 Through 28-05-2006",

}

TY - CONF

T1 - Design, construction and validation of an Arabic-english conceptual interlingua for cross-lingual information retrieval

AU - Habash, Nizar

AU - Mah, Clinton

AU - Imran, Sabiha

AU - Calistri-Yeh, Randy

AU - Sheridan, Páraic

PY - 2006/1/1

Y1 - 2006/1/1

N2 - This paper describes the issues involved in extending a trans-lingual lexicon, the TextWise Conceptual Interlingua (CI), with Arabic terms. The Conceptual Interlingua is based on the Princeton English WordNet (Fellbaum, 1998). It is a central component in the cross-lingual information retrieval (CLIR) system CINDOR (Conceptual INterlingua for DOcument Retrieval). Arabic has a rich morphological system combining templatic and affixational paradigms for both inflectional and derivational morphology. This rich morphology poses a major challenge to the design and building of the Arabic CI and also its validation. This is because the available resources for Arabic, whether manually constructed bilingual lexicons or lexicons automatically derived from bilingual parallel corpora, exist at different levels of morphological representation. We describe here the issues and decisions made in the design and construction of the Arabic-English CI using different types of manual and automatic resources. We also present the results of an extensive validation of the Arabic CI and briefly discuss the evaluation of its use for CLIR on the TREC Arabic Benchmark collection.

AB - This paper describes the issues involved in extending a trans-lingual lexicon, the TextWise Conceptual Interlingua (CI), with Arabic terms. The Conceptual Interlingua is based on the Princeton English WordNet (Fellbaum, 1998). It is a central component in the cross-lingual information retrieval (CLIR) system CINDOR (Conceptual INterlingua for DOcument Retrieval). Arabic has a rich morphological system combining templatic and affixational paradigms for both inflectional and derivational morphology. This rich morphology poses a major challenge to the design and building of the Arabic CI and also its validation. This is because the available resources for Arabic, whether manually constructed bilingual lexicons or lexicons automatically derived from bilingual parallel corpora, exist at different levels of morphological representation. We describe here the issues and decisions made in the design and construction of the Arabic-English CI using different types of manual and automatic resources. We also present the results of an extensive validation of the Arabic CI and briefly discuss the evaluation of its use for CLIR on the TREC Arabic Benchmark collection.

UR - http://www.scopus.com/inward/record.url?scp=85029115121&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85029115121&partnerID=8YFLogxK

M3 - Paper

SP - 107

EP - 112

ER -