Interlingual annotation of parallel text corpora: A new framework for annotation and evaluation

Bonnie J. Dorr, Rebecca J. Passonneau, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy, Lori Levin, Keith J. Miller, Teruko Mitamura, Owen Rambow, Advaith Siddharthan

Research output: Contribution to journalArticle

Abstract

This paper focuses on an important step in the creation of a system of meaning representation and the development of semantically annotated parallel corpora, for use in applications such as machine translation, question answering, text summarization, and information retrieval. The work described below constitutes the first effort of any kind to annotate multiple translations of foreign-language texts with interlingual content. Three levels of representation are introduced: deep syntactic dependencies (IL0), intermediate semantic representations (IL1), and a normalized representation that unifies conversives, nonliteral language, and paraphrase (IL2). The resulting annotated, multilingually induced, parallel corpora will be useful as an empirical basis for a wide range of research, including the development and evaluation of interlingual NLP systems and paraphrase-extraction systems as well as a host of other research and development efforts in theoretical and applied linguistics, foreign language pedagogy, translation studies, and other related disciplines.

Original languageEnglish (US)
Pages (from-to)197-243
Number of pages47
JournalNatural Language Engineering
Volume16
Issue number3
DOIs
StatePublished - Jul 1 2010

Fingerprint

Syntactics
Information retrieval
evaluation
Linguistics
foreign language
Semantics
information retrieval
research and development
semantics
linguistics
Annotation
Evaluation
Parallel Texts
Text Corpus
language
Paraphrase
Parallel Corpora
Language Pedagogy
Applied Linguistics
Meaning Representation

ASJC Scopus subject areas

  • Software
  • Language and Linguistics
  • Linguistics and Language
  • Artificial Intelligence

Cite this

Dorr, B. J., Passonneau, R. J., Farwell, D., Green, R., Habash, N., Helmreich, S., ... Siddharthan, A. (2010). Interlingual annotation of parallel text corpora: A new framework for annotation and evaluation. Natural Language Engineering, 16(3), 197-243. https://doi.org/10.1017/S1351324910000070

Interlingual annotation of parallel text corpora : A new framework for annotation and evaluation. / Dorr, Bonnie J.; Passonneau, Rebecca J.; Farwell, David; Green, Rebecca; Habash, Nizar; Helmreich, Stephen; Hovy, Eduard; Levin, Lori; Miller, Keith J.; Mitamura, Teruko; Rambow, Owen; Siddharthan, Advaith.

In: Natural Language Engineering, Vol. 16, No. 3, 01.07.2010, p. 197-243.

Research output: Contribution to journalArticle

Dorr, BJ, Passonneau, RJ, Farwell, D, Green, R, Habash, N, Helmreich, S, Hovy, E, Levin, L, Miller, KJ, Mitamura, T, Rambow, O & Siddharthan, A 2010, 'Interlingual annotation of parallel text corpora: A new framework for annotation and evaluation', Natural Language Engineering, vol. 16, no. 3, pp. 197-243. https://doi.org/10.1017/S1351324910000070
Dorr, Bonnie J. ; Passonneau, Rebecca J. ; Farwell, David ; Green, Rebecca ; Habash, Nizar ; Helmreich, Stephen ; Hovy, Eduard ; Levin, Lori ; Miller, Keith J. ; Mitamura, Teruko ; Rambow, Owen ; Siddharthan, Advaith. / Interlingual annotation of parallel text corpora : A new framework for annotation and evaluation. In: Natural Language Engineering. 2010 ; Vol. 16, No. 3. pp. 197-243.
@article{a58d5dfcaadd4708b413fb1ad2b3b104,
title = "Interlingual annotation of parallel text corpora: A new framework for annotation and evaluation",
abstract = "This paper focuses on an important step in the creation of a system of meaning representation and the development of semantically annotated parallel corpora, for use in applications such as machine translation, question answering, text summarization, and information retrieval. The work described below constitutes the first effort of any kind to annotate multiple translations of foreign-language texts with interlingual content. Three levels of representation are introduced: deep syntactic dependencies (IL0), intermediate semantic representations (IL1), and a normalized representation that unifies conversives, nonliteral language, and paraphrase (IL2). The resulting annotated, multilingually induced, parallel corpora will be useful as an empirical basis for a wide range of research, including the development and evaluation of interlingual NLP systems and paraphrase-extraction systems as well as a host of other research and development efforts in theoretical and applied linguistics, foreign language pedagogy, translation studies, and other related disciplines.",
author = "Dorr, {Bonnie J.} and Passonneau, {Rebecca J.} and David Farwell and Rebecca Green and Nizar Habash and Stephen Helmreich and Eduard Hovy and Lori Levin and Miller, {Keith J.} and Teruko Mitamura and Owen Rambow and Advaith Siddharthan",
year = "2010",
month = "7",
day = "1",
doi = "10.1017/S1351324910000070",
language = "English (US)",
volume = "16",
pages = "197--243",
journal = "Natural Language Engineering",
issn = "1351-3249",
publisher = "Cambridge University Press",
number = "3",

}

TY - JOUR

T1 - Interlingual annotation of parallel text corpora

T2 - A new framework for annotation and evaluation

AU - Dorr, Bonnie J.

AU - Passonneau, Rebecca J.

AU - Farwell, David

AU - Green, Rebecca

AU - Habash, Nizar

AU - Helmreich, Stephen

AU - Hovy, Eduard

AU - Levin, Lori

AU - Miller, Keith J.

AU - Mitamura, Teruko

AU - Rambow, Owen

AU - Siddharthan, Advaith

PY - 2010/7/1

Y1 - 2010/7/1

N2 - This paper focuses on an important step in the creation of a system of meaning representation and the development of semantically annotated parallel corpora, for use in applications such as machine translation, question answering, text summarization, and information retrieval. The work described below constitutes the first effort of any kind to annotate multiple translations of foreign-language texts with interlingual content. Three levels of representation are introduced: deep syntactic dependencies (IL0), intermediate semantic representations (IL1), and a normalized representation that unifies conversives, nonliteral language, and paraphrase (IL2). The resulting annotated, multilingually induced, parallel corpora will be useful as an empirical basis for a wide range of research, including the development and evaluation of interlingual NLP systems and paraphrase-extraction systems as well as a host of other research and development efforts in theoretical and applied linguistics, foreign language pedagogy, translation studies, and other related disciplines.

AB - This paper focuses on an important step in the creation of a system of meaning representation and the development of semantically annotated parallel corpora, for use in applications such as machine translation, question answering, text summarization, and information retrieval. The work described below constitutes the first effort of any kind to annotate multiple translations of foreign-language texts with interlingual content. Three levels of representation are introduced: deep syntactic dependencies (IL0), intermediate semantic representations (IL1), and a normalized representation that unifies conversives, nonliteral language, and paraphrase (IL2). The resulting annotated, multilingually induced, parallel corpora will be useful as an empirical basis for a wide range of research, including the development and evaluation of interlingual NLP systems and paraphrase-extraction systems as well as a host of other research and development efforts in theoretical and applied linguistics, foreign language pedagogy, translation studies, and other related disciplines.

UR - http://www.scopus.com/inward/record.url?scp=78650044500&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78650044500&partnerID=8YFLogxK

U2 - 10.1017/S1351324910000070

DO - 10.1017/S1351324910000070

M3 - Article

AN - SCOPUS:78650044500

VL - 16

SP - 197

EP - 243

JO - Natural Language Engineering

JF - Natural Language Engineering

SN - 1351-3249

IS - 3

ER -