Inter-annotator agreement on a multilingual semantic annotation task

Rebecca Passonneau, Nizar Habash, Owen Rambow

    Research output: Contribution to conferencePaper

    Abstract

    Six sites participated in the Interlingual Annotation of Multilingual Text Corpora (IAMTC) project (Dorr et al., 2004; Farwell et al., 2004; Mitamura et al., 2004). Parsed versions of English translations of news articles in Arabic, French, Hindi, Japanese, Korean and Spanish were annotated by up to ten annotators. Their task was to match open-class lexical items (nouns, verbs, adjectives, adverbs) to one or more concepts taken from the Omega ontology (Philpot et al., 2003), and to identify theta roles for verb arguments. The annotated corpus is intended to be a resource for meaning-based approaches to machine translation. Here we discuss inter-annotator agreement for the corpus. The annotation task is characterized by annotators' freedom to select multiple concepts or roles per lexical item. As a result, the annotation categories are sets, the number of which is bounded only by the number of distinct annotator-lexical item pairs. We use a reliability metric designed to handle partial agreement between sets. The best results pertain to the part of the ontology derived from WordNet. We examine change over the course of the project, differences among annotators, and differences across parts of speech. Our results suggest a strong learning effect early in the project.

    Original languageEnglish (US)
    Pages1951-1956
    Number of pages6
    StatePublished - Jan 1 2006
    Event5th International Conference on Language Resources and Evaluation, LREC 2006 - Genoa, Italy
    Duration: May 22 2006May 28 2006

    Other

    Other5th International Conference on Language Resources and Evaluation, LREC 2006
    CountryItaly
    CityGenoa
    Period5/22/065/28/06

    Fingerprint

    semantics
    ontology
    learning success
    news
    resources
    Lexical Item
    Annotation
    Verbs
    Ontology
    Adjective
    English Translation
    WordNet
    Adverb
    Text Corpus
    News Articles
    Open Class
    Part of Speech
    Theta Roles
    Nouns
    Resources

    ASJC Scopus subject areas

    • Education
    • Library and Information Sciences
    • Linguistics and Language
    • Language and Linguistics

    Cite this

    Passonneau, R., Habash, N., & Rambow, O. (2006). Inter-annotator agreement on a multilingual semantic annotation task. 1951-1956. Paper presented at 5th International Conference on Language Resources and Evaluation, LREC 2006, Genoa, Italy.

    Inter-annotator agreement on a multilingual semantic annotation task. / Passonneau, Rebecca; Habash, Nizar; Rambow, Owen.

    2006. 1951-1956 Paper presented at 5th International Conference on Language Resources and Evaluation, LREC 2006, Genoa, Italy.

    Research output: Contribution to conferencePaper

    Passonneau, R, Habash, N & Rambow, O 2006, 'Inter-annotator agreement on a multilingual semantic annotation task' Paper presented at 5th International Conference on Language Resources and Evaluation, LREC 2006, Genoa, Italy, 5/22/06 - 5/28/06, pp. 1951-1956.
    Passonneau R, Habash N, Rambow O. Inter-annotator agreement on a multilingual semantic annotation task. 2006. Paper presented at 5th International Conference on Language Resources and Evaluation, LREC 2006, Genoa, Italy.
    Passonneau, Rebecca ; Habash, Nizar ; Rambow, Owen. / Inter-annotator agreement on a multilingual semantic annotation task. Paper presented at 5th International Conference on Language Resources and Evaluation, LREC 2006, Genoa, Italy.6 p.
    @conference{09f323a7f8894f5fa275c35fbdcb81e1,
    title = "Inter-annotator agreement on a multilingual semantic annotation task",
    abstract = "Six sites participated in the Interlingual Annotation of Multilingual Text Corpora (IAMTC) project (Dorr et al., 2004; Farwell et al., 2004; Mitamura et al., 2004). Parsed versions of English translations of news articles in Arabic, French, Hindi, Japanese, Korean and Spanish were annotated by up to ten annotators. Their task was to match open-class lexical items (nouns, verbs, adjectives, adverbs) to one or more concepts taken from the Omega ontology (Philpot et al., 2003), and to identify theta roles for verb arguments. The annotated corpus is intended to be a resource for meaning-based approaches to machine translation. Here we discuss inter-annotator agreement for the corpus. The annotation task is characterized by annotators' freedom to select multiple concepts or roles per lexical item. As a result, the annotation categories are sets, the number of which is bounded only by the number of distinct annotator-lexical item pairs. We use a reliability metric designed to handle partial agreement between sets. The best results pertain to the part of the ontology derived from WordNet. We examine change over the course of the project, differences among annotators, and differences across parts of speech. Our results suggest a strong learning effect early in the project.",
    author = "Rebecca Passonneau and Nizar Habash and Owen Rambow",
    year = "2006",
    month = "1",
    day = "1",
    language = "English (US)",
    pages = "1951--1956",
    note = "5th International Conference on Language Resources and Evaluation, LREC 2006 ; Conference date: 22-05-2006 Through 28-05-2006",

    }

    TY - CONF

    T1 - Inter-annotator agreement on a multilingual semantic annotation task

    AU - Passonneau, Rebecca

    AU - Habash, Nizar

    AU - Rambow, Owen

    PY - 2006/1/1

    Y1 - 2006/1/1

    N2 - Six sites participated in the Interlingual Annotation of Multilingual Text Corpora (IAMTC) project (Dorr et al., 2004; Farwell et al., 2004; Mitamura et al., 2004). Parsed versions of English translations of news articles in Arabic, French, Hindi, Japanese, Korean and Spanish were annotated by up to ten annotators. Their task was to match open-class lexical items (nouns, verbs, adjectives, adverbs) to one or more concepts taken from the Omega ontology (Philpot et al., 2003), and to identify theta roles for verb arguments. The annotated corpus is intended to be a resource for meaning-based approaches to machine translation. Here we discuss inter-annotator agreement for the corpus. The annotation task is characterized by annotators' freedom to select multiple concepts or roles per lexical item. As a result, the annotation categories are sets, the number of which is bounded only by the number of distinct annotator-lexical item pairs. We use a reliability metric designed to handle partial agreement between sets. The best results pertain to the part of the ontology derived from WordNet. We examine change over the course of the project, differences among annotators, and differences across parts of speech. Our results suggest a strong learning effect early in the project.

    AB - Six sites participated in the Interlingual Annotation of Multilingual Text Corpora (IAMTC) project (Dorr et al., 2004; Farwell et al., 2004; Mitamura et al., 2004). Parsed versions of English translations of news articles in Arabic, French, Hindi, Japanese, Korean and Spanish were annotated by up to ten annotators. Their task was to match open-class lexical items (nouns, verbs, adjectives, adverbs) to one or more concepts taken from the Omega ontology (Philpot et al., 2003), and to identify theta roles for verb arguments. The annotated corpus is intended to be a resource for meaning-based approaches to machine translation. Here we discuss inter-annotator agreement for the corpus. The annotation task is characterized by annotators' freedom to select multiple concepts or roles per lexical item. As a result, the annotation categories are sets, the number of which is bounded only by the number of distinct annotator-lexical item pairs. We use a reliability metric designed to handle partial agreement between sets. The best results pertain to the part of the ontology derived from WordNet. We examine change over the course of the project, differences among annotators, and differences across parts of speech. Our results suggest a strong learning effect early in the project.

    UR - http://www.scopus.com/inward/record.url?scp=57349129738&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=57349129738&partnerID=8YFLogxK

    M3 - Paper

    SP - 1951

    EP - 1956

    ER -