Interlingual annotation of parallel text corpora

A new framework for annotation and evaluation

Bonnie J. Dorr, Rebecca J. Passonneau, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy, Lori Levin, Keith J. Miller, Teruko Mitamura, Owen Rambow, Advaith Siddharthan

    Research output: Contribution to journalArticle

    Abstract

    This paper focuses on an important step in the creation of a system of meaning representation and the development of semantically annotated parallel corpora, for use in applications such as machine translation, question answering, text summarization, and information retrieval. The work described below constitutes the first effort of any kind to annotate multiple translations of foreign-language texts with interlingual content. Three levels of representation are introduced: deep syntactic dependencies (IL0), intermediate semantic representations (IL1), and a normalized representation that unifies conversives, nonliteral language, and paraphrase (IL2). The resulting annotated, multilingually induced, parallel corpora will be useful as an empirical basis for a wide range of research, including the development and evaluation of interlingual NLP systems and paraphrase-extraction systems as well as a host of other research and development efforts in theoretical and applied linguistics, foreign language pedagogy, translation studies, and other related disciplines.

    Original languageEnglish (US)
    Pages (from-to)197-243
    Number of pages47
    JournalNatural Language Engineering
    Volume16
    Issue number3
    DOIs
    StatePublished - Jul 1 2010

    Fingerprint

    Syntactics
    Information retrieval
    evaluation
    Linguistics
    foreign language
    Semantics
    information retrieval
    research and development
    semantics
    linguistics
    Evaluation
    Annotation
    Text Corpus
    Parallel Texts
    language
    Paraphrase
    Parallel Corpora
    Language Pedagogy
    Intermediate
    Meaning Representation

    ASJC Scopus subject areas

    • Software
    • Language and Linguistics
    • Linguistics and Language
    • Artificial Intelligence

    Cite this

    Dorr, B. J., Passonneau, R. J., Farwell, D., Green, R., Habash, N., Helmreich, S., ... Siddharthan, A. (2010). Interlingual annotation of parallel text corpora: A new framework for annotation and evaluation. Natural Language Engineering, 16(3), 197-243. https://doi.org/10.1017/S1351324910000070

    Interlingual annotation of parallel text corpora : A new framework for annotation and evaluation. / Dorr, Bonnie J.; Passonneau, Rebecca J.; Farwell, David; Green, Rebecca; Habash, Nizar; Helmreich, Stephen; Hovy, Eduard; Levin, Lori; Miller, Keith J.; Mitamura, Teruko; Rambow, Owen; Siddharthan, Advaith.

    In: Natural Language Engineering, Vol. 16, No. 3, 01.07.2010, p. 197-243.

    Research output: Contribution to journalArticle

    Dorr, BJ, Passonneau, RJ, Farwell, D, Green, R, Habash, N, Helmreich, S, Hovy, E, Levin, L, Miller, KJ, Mitamura, T, Rambow, O & Siddharthan, A 2010, 'Interlingual annotation of parallel text corpora: A new framework for annotation and evaluation', Natural Language Engineering, vol. 16, no. 3, pp. 197-243. https://doi.org/10.1017/S1351324910000070
    Dorr, Bonnie J. ; Passonneau, Rebecca J. ; Farwell, David ; Green, Rebecca ; Habash, Nizar ; Helmreich, Stephen ; Hovy, Eduard ; Levin, Lori ; Miller, Keith J. ; Mitamura, Teruko ; Rambow, Owen ; Siddharthan, Advaith. / Interlingual annotation of parallel text corpora : A new framework for annotation and evaluation. In: Natural Language Engineering. 2010 ; Vol. 16, No. 3. pp. 197-243.
    @article{a58d5dfcaadd4708b413fb1ad2b3b104,
    title = "Interlingual annotation of parallel text corpora: A new framework for annotation and evaluation",
    abstract = "This paper focuses on an important step in the creation of a system of meaning representation and the development of semantically annotated parallel corpora, for use in applications such as machine translation, question answering, text summarization, and information retrieval. The work described below constitutes the first effort of any kind to annotate multiple translations of foreign-language texts with interlingual content. Three levels of representation are introduced: deep syntactic dependencies (IL0), intermediate semantic representations (IL1), and a normalized representation that unifies conversives, nonliteral language, and paraphrase (IL2). The resulting annotated, multilingually induced, parallel corpora will be useful as an empirical basis for a wide range of research, including the development and evaluation of interlingual NLP systems and paraphrase-extraction systems as well as a host of other research and development efforts in theoretical and applied linguistics, foreign language pedagogy, translation studies, and other related disciplines.",
    author = "Dorr, {Bonnie J.} and Passonneau, {Rebecca J.} and David Farwell and Rebecca Green and Nizar Habash and Stephen Helmreich and Eduard Hovy and Lori Levin and Miller, {Keith J.} and Teruko Mitamura and Owen Rambow and Advaith Siddharthan",
    year = "2010",
    month = "7",
    day = "1",
    doi = "10.1017/S1351324910000070",
    language = "English (US)",
    volume = "16",
    pages = "197--243",
    journal = "Natural Language Engineering",
    issn = "1351-3249",
    publisher = "Cambridge University Press",
    number = "3",

    }

    TY - JOUR

    T1 - Interlingual annotation of parallel text corpora

    T2 - A new framework for annotation and evaluation

    AU - Dorr, Bonnie J.

    AU - Passonneau, Rebecca J.

    AU - Farwell, David

    AU - Green, Rebecca

    AU - Habash, Nizar

    AU - Helmreich, Stephen

    AU - Hovy, Eduard

    AU - Levin, Lori

    AU - Miller, Keith J.

    AU - Mitamura, Teruko

    AU - Rambow, Owen

    AU - Siddharthan, Advaith

    PY - 2010/7/1

    Y1 - 2010/7/1

    N2 - This paper focuses on an important step in the creation of a system of meaning representation and the development of semantically annotated parallel corpora, for use in applications such as machine translation, question answering, text summarization, and information retrieval. The work described below constitutes the first effort of any kind to annotate multiple translations of foreign-language texts with interlingual content. Three levels of representation are introduced: deep syntactic dependencies (IL0), intermediate semantic representations (IL1), and a normalized representation that unifies conversives, nonliteral language, and paraphrase (IL2). The resulting annotated, multilingually induced, parallel corpora will be useful as an empirical basis for a wide range of research, including the development and evaluation of interlingual NLP systems and paraphrase-extraction systems as well as a host of other research and development efforts in theoretical and applied linguistics, foreign language pedagogy, translation studies, and other related disciplines.

    AB - This paper focuses on an important step in the creation of a system of meaning representation and the development of semantically annotated parallel corpora, for use in applications such as machine translation, question answering, text summarization, and information retrieval. The work described below constitutes the first effort of any kind to annotate multiple translations of foreign-language texts with interlingual content. Three levels of representation are introduced: deep syntactic dependencies (IL0), intermediate semantic representations (IL1), and a normalized representation that unifies conversives, nonliteral language, and paraphrase (IL2). The resulting annotated, multilingually induced, parallel corpora will be useful as an empirical basis for a wide range of research, including the development and evaluation of interlingual NLP systems and paraphrase-extraction systems as well as a host of other research and development efforts in theoretical and applied linguistics, foreign language pedagogy, translation studies, and other related disciplines.

    UR - http://www.scopus.com/inward/record.url?scp=78650044500&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=78650044500&partnerID=8YFLogxK

    U2 - 10.1017/S1351324910000070

    DO - 10.1017/S1351324910000070

    M3 - Article

    VL - 16

    SP - 197

    EP - 243

    JO - Natural Language Engineering

    JF - Natural Language Engineering

    SN - 1351-3249

    IS - 3

    ER -