A large annotated corpus for learning natural language inference

Samuel Bowman, Gabor Angeli, Christopher Potts, Christopher D. Manning

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Understanding entailment and contradiction is fundamental to understanding natural language, and inference about entailment and contradiction is a valuable testing ground for the development of semantic representations. However, machine learning research in this area has been dramatically limited by the lack of large-scale resources. To address this, we introduce the Stanford Natural Language Inference corpus, a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning. At 570K pairs, it is two orders of magnitude larger than all other resources of its type. This increase in scale allows lexicalized classifiers to outperform some sophisticated existing entailment models, and it allows a neural network-based model to perform competitively on natural language inference benchmarks for the first time.

    Original languageEnglish (US)
    Title of host publicationConference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing
    PublisherAssociation for Computational Linguistics (ACL)
    Pages632-642
    Number of pages11
    ISBN (Electronic)9781941643327
    StatePublished - 2015
    EventConference on Empirical Methods in Natural Language Processing, EMNLP 2015 - Lisbon, Portugal
    Duration: Sep 17 2015Sep 21 2015

    Other

    OtherConference on Empirical Methods in Natural Language Processing, EMNLP 2015
    CountryPortugal
    CityLisbon
    Period9/17/159/21/15

    Fingerprint

    Learning systems
    Classifiers
    Semantics
    Neural networks
    Testing

    ASJC Scopus subject areas

    • Computational Theory and Mathematics
    • Computer Science Applications
    • Information Systems

    Cite this

    Bowman, S., Angeli, G., Potts, C., & Manning, C. D. (2015). A large annotated corpus for learning natural language inference. In Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing (pp. 632-642). Association for Computational Linguistics (ACL).

    A large annotated corpus for learning natural language inference. / Bowman, Samuel; Angeli, Gabor; Potts, Christopher; Manning, Christopher D.

    Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (ACL), 2015. p. 632-642.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Bowman, S, Angeli, G, Potts, C & Manning, CD 2015, A large annotated corpus for learning natural language inference. in Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (ACL), pp. 632-642, Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, 9/17/15.
    Bowman S, Angeli G, Potts C, Manning CD. A large annotated corpus for learning natural language inference. In Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (ACL). 2015. p. 632-642
    Bowman, Samuel ; Angeli, Gabor ; Potts, Christopher ; Manning, Christopher D. / A large annotated corpus for learning natural language inference. Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (ACL), 2015. pp. 632-642
    @inproceedings{94e08fa7d6994eb3b907085f90f4ae1b,
    title = "A large annotated corpus for learning natural language inference",
    abstract = "Understanding entailment and contradiction is fundamental to understanding natural language, and inference about entailment and contradiction is a valuable testing ground for the development of semantic representations. However, machine learning research in this area has been dramatically limited by the lack of large-scale resources. To address this, we introduce the Stanford Natural Language Inference corpus, a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning. At 570K pairs, it is two orders of magnitude larger than all other resources of its type. This increase in scale allows lexicalized classifiers to outperform some sophisticated existing entailment models, and it allows a neural network-based model to perform competitively on natural language inference benchmarks for the first time.",
    author = "Samuel Bowman and Gabor Angeli and Christopher Potts and Manning, {Christopher D.}",
    year = "2015",
    language = "English (US)",
    pages = "632--642",
    booktitle = "Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing",
    publisher = "Association for Computational Linguistics (ACL)",

    }

    TY - GEN

    T1 - A large annotated corpus for learning natural language inference

    AU - Bowman, Samuel

    AU - Angeli, Gabor

    AU - Potts, Christopher

    AU - Manning, Christopher D.

    PY - 2015

    Y1 - 2015

    N2 - Understanding entailment and contradiction is fundamental to understanding natural language, and inference about entailment and contradiction is a valuable testing ground for the development of semantic representations. However, machine learning research in this area has been dramatically limited by the lack of large-scale resources. To address this, we introduce the Stanford Natural Language Inference corpus, a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning. At 570K pairs, it is two orders of magnitude larger than all other resources of its type. This increase in scale allows lexicalized classifiers to outperform some sophisticated existing entailment models, and it allows a neural network-based model to perform competitively on natural language inference benchmarks for the first time.

    AB - Understanding entailment and contradiction is fundamental to understanding natural language, and inference about entailment and contradiction is a valuable testing ground for the development of semantic representations. However, machine learning research in this area has been dramatically limited by the lack of large-scale resources. To address this, we introduce the Stanford Natural Language Inference corpus, a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning. At 570K pairs, it is two orders of magnitude larger than all other resources of its type. This increase in scale allows lexicalized classifiers to outperform some sophisticated existing entailment models, and it allows a neural network-based model to perform competitively on natural language inference benchmarks for the first time.

    UR - http://www.scopus.com/inward/record.url?scp=84959892771&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84959892771&partnerID=8YFLogxK

    M3 - Conference contribution

    AN - SCOPUS:84959892771

    SP - 632

    EP - 642

    BT - Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing

    PB - Association for Computational Linguistics (ACL)

    ER -