A gold standard dependency corpus for English

Natalia Silveira, Timothy Dozat, Marie Catherine De Marneffe, Samuel Bowman, Miriam Connor, John Bauer, Christopher D. Manning

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    We present a gold standard annotation of syntactic dependencies in the English Web Treebank corpus using the Stanford Dependencies standard. This resource addresses the lack of a gold standard dependency treebank for English, as well as the limited availability of gold standard syntactic annotations for informal genres of English text. We also present experiments on the use of this resource, both for training dependency parsers and for evaluating dependency parsers like the one included as part of the Stanford Parser. We show that training a dependency parser on a mix of newswire and web data improves performance on that type of data without greatly hurting performance on newswire text, and therefore gold standard annotations for non-canonical text can be valuable for parsing in general. Furthermore, the systematic annotation effort has informed both the SD formalism and its implementation in the Stanford Parser's dependency converter. In response to the challenges encountered by annotators in the EWT corpus, we revised and extended the Stanford Dependencies standard, and improved the Stanford Parser's dependency converter.

    Original languageEnglish (US)
    Title of host publicationProceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014
    PublisherEuropean Language Resources Association (ELRA)
    Pages2897-2904
    Number of pages8
    ISBN (Electronic)9782951740884
    StatePublished - Jan 1 2014
    Event9th International Conference on Language Resources and Evaluation, LREC 2014 - Reykjavik, Iceland
    Duration: May 26 2014May 31 2014

    Other

    Other9th International Conference on Language Resources and Evaluation, LREC 2014
    CountryIceland
    CityReykjavik
    Period5/26/145/31/14

    Fingerprint

    gold standard
    resources
    performance
    genre
    Gold Standard
    lack
    experiment
    Annotation

    Keywords

    • Dependency grammar
    • Stanford dependencies
    • Web treebank

    ASJC Scopus subject areas

    • Linguistics and Language
    • Library and Information Sciences
    • Education
    • Language and Linguistics

    Cite this

    Silveira, N., Dozat, T., De Marneffe, M. C., Bowman, S., Connor, M., Bauer, J., & Manning, C. D. (2014). A gold standard dependency corpus for English. In Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014 (pp. 2897-2904). European Language Resources Association (ELRA).

    A gold standard dependency corpus for English. / Silveira, Natalia; Dozat, Timothy; De Marneffe, Marie Catherine; Bowman, Samuel; Connor, Miriam; Bauer, John; Manning, Christopher D.

    Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014. European Language Resources Association (ELRA), 2014. p. 2897-2904.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Silveira, N, Dozat, T, De Marneffe, MC, Bowman, S, Connor, M, Bauer, J & Manning, CD 2014, A gold standard dependency corpus for English. in Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014. European Language Resources Association (ELRA), pp. 2897-2904, 9th International Conference on Language Resources and Evaluation, LREC 2014, Reykjavik, Iceland, 5/26/14.
    Silveira N, Dozat T, De Marneffe MC, Bowman S, Connor M, Bauer J et al. A gold standard dependency corpus for English. In Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014. European Language Resources Association (ELRA). 2014. p. 2897-2904
    Silveira, Natalia ; Dozat, Timothy ; De Marneffe, Marie Catherine ; Bowman, Samuel ; Connor, Miriam ; Bauer, John ; Manning, Christopher D. / A gold standard dependency corpus for English. Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014. European Language Resources Association (ELRA), 2014. pp. 2897-2904
    @inproceedings{1dd1085f380c40559c2d5baefa7ff419,
    title = "A gold standard dependency corpus for English",
    abstract = "We present a gold standard annotation of syntactic dependencies in the English Web Treebank corpus using the Stanford Dependencies standard. This resource addresses the lack of a gold standard dependency treebank for English, as well as the limited availability of gold standard syntactic annotations for informal genres of English text. We also present experiments on the use of this resource, both for training dependency parsers and for evaluating dependency parsers like the one included as part of the Stanford Parser. We show that training a dependency parser on a mix of newswire and web data improves performance on that type of data without greatly hurting performance on newswire text, and therefore gold standard annotations for non-canonical text can be valuable for parsing in general. Furthermore, the systematic annotation effort has informed both the SD formalism and its implementation in the Stanford Parser's dependency converter. In response to the challenges encountered by annotators in the EWT corpus, we revised and extended the Stanford Dependencies standard, and improved the Stanford Parser's dependency converter.",
    keywords = "Dependency grammar, Stanford dependencies, Web treebank",
    author = "Natalia Silveira and Timothy Dozat and {De Marneffe}, {Marie Catherine} and Samuel Bowman and Miriam Connor and John Bauer and Manning, {Christopher D.}",
    year = "2014",
    month = "1",
    day = "1",
    language = "English (US)",
    pages = "2897--2904",
    booktitle = "Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014",
    publisher = "European Language Resources Association (ELRA)",

    }

    TY - GEN

    T1 - A gold standard dependency corpus for English

    AU - Silveira, Natalia

    AU - Dozat, Timothy

    AU - De Marneffe, Marie Catherine

    AU - Bowman, Samuel

    AU - Connor, Miriam

    AU - Bauer, John

    AU - Manning, Christopher D.

    PY - 2014/1/1

    Y1 - 2014/1/1

    N2 - We present a gold standard annotation of syntactic dependencies in the English Web Treebank corpus using the Stanford Dependencies standard. This resource addresses the lack of a gold standard dependency treebank for English, as well as the limited availability of gold standard syntactic annotations for informal genres of English text. We also present experiments on the use of this resource, both for training dependency parsers and for evaluating dependency parsers like the one included as part of the Stanford Parser. We show that training a dependency parser on a mix of newswire and web data improves performance on that type of data without greatly hurting performance on newswire text, and therefore gold standard annotations for non-canonical text can be valuable for parsing in general. Furthermore, the systematic annotation effort has informed both the SD formalism and its implementation in the Stanford Parser's dependency converter. In response to the challenges encountered by annotators in the EWT corpus, we revised and extended the Stanford Dependencies standard, and improved the Stanford Parser's dependency converter.

    AB - We present a gold standard annotation of syntactic dependencies in the English Web Treebank corpus using the Stanford Dependencies standard. This resource addresses the lack of a gold standard dependency treebank for English, as well as the limited availability of gold standard syntactic annotations for informal genres of English text. We also present experiments on the use of this resource, both for training dependency parsers and for evaluating dependency parsers like the one included as part of the Stanford Parser. We show that training a dependency parser on a mix of newswire and web data improves performance on that type of data without greatly hurting performance on newswire text, and therefore gold standard annotations for non-canonical text can be valuable for parsing in general. Furthermore, the systematic annotation effort has informed both the SD formalism and its implementation in the Stanford Parser's dependency converter. In response to the challenges encountered by annotators in the EWT corpus, we revised and extended the Stanford Dependencies standard, and improved the Stanford Parser's dependency converter.

    KW - Dependency grammar

    KW - Stanford dependencies

    KW - Web treebank

    UR - http://www.scopus.com/inward/record.url?scp=84977556694&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84977556694&partnerID=8YFLogxK

    M3 - Conference contribution

    SP - 2897

    EP - 2904

    BT - Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014

    PB - European Language Resources Association (ELRA)

    ER -