Generating sentences from a continuous space

Samuel Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, Samy Bengio

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    The standard recurrent neural network language model (rnnlm) generates sentences one word at a time and does not work from an explicit global sentence representation. In this work, we introduce and study an rnn-based variational autoencoder generative model that incorporates distributed latent representations of entire sentences. This factorization allows it to explicitly model holistic properties of sentences such as style, topic, and high-level syntactic features. Samples from the prior over these sentence representations remarkably produce diverse and well-formed sentences through simple deterministic decoding. By examining paths through this latent space, we are able to generate coherent novel sentences that interpolate between known sentences. We present techniques for solving the difficult learning problem presented by this model, demonstrate its effectiveness in imputing missing words, explore many interesting properties of the model’s latent sentence space, and present negative results on the use of the model in language modeling.

    Original languageEnglish (US)
    Title of host publicationCoNLL 2016 - 20th SIGNLL Conference on Computational Natural Language Learning, Proceedings
    PublisherAssociation for Computational Linguistics (ACL)
    Pages10-21
    Number of pages12
    ISBN (Electronic)9781945626197
    StatePublished - Jan 1 2016
    Event20th SIGNLL Conference on Computational Natural Language Learning, CoNLL 2016 - Berlin, Germany
    Duration: Aug 11 2016Aug 12 2016

    Publication series

    NameCoNLL 2016 - 20th SIGNLL Conference on Computational Natural Language Learning, Proceedings

    Conference

    Conference20th SIGNLL Conference on Computational Natural Language Learning, CoNLL 2016
    CountryGermany
    CityBerlin
    Period8/11/168/12/16

    Fingerprint

    Recurrent neural networks
    Syntactics
    language
    Factorization
    neural network
    Decoding
    present
    learning
    time

    ASJC Scopus subject areas

    • Artificial Intelligence
    • Human-Computer Interaction
    • Linguistics and Language

    Cite this

    Bowman, S., Vilnis, L., Vinyals, O., Dai, A. M., Jozefowicz, R., & Bengio, S. (2016). Generating sentences from a continuous space. In CoNLL 2016 - 20th SIGNLL Conference on Computational Natural Language Learning, Proceedings (pp. 10-21). (CoNLL 2016 - 20th SIGNLL Conference on Computational Natural Language Learning, Proceedings). Association for Computational Linguistics (ACL).

    Generating sentences from a continuous space. / Bowman, Samuel; Vilnis, Luke; Vinyals, Oriol; Dai, Andrew M.; Jozefowicz, Rafal; Bengio, Samy.

    CoNLL 2016 - 20th SIGNLL Conference on Computational Natural Language Learning, Proceedings. Association for Computational Linguistics (ACL), 2016. p. 10-21 (CoNLL 2016 - 20th SIGNLL Conference on Computational Natural Language Learning, Proceedings).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Bowman, S, Vilnis, L, Vinyals, O, Dai, AM, Jozefowicz, R & Bengio, S 2016, Generating sentences from a continuous space. in CoNLL 2016 - 20th SIGNLL Conference on Computational Natural Language Learning, Proceedings. CoNLL 2016 - 20th SIGNLL Conference on Computational Natural Language Learning, Proceedings, Association for Computational Linguistics (ACL), pp. 10-21, 20th SIGNLL Conference on Computational Natural Language Learning, CoNLL 2016, Berlin, Germany, 8/11/16.
    Bowman S, Vilnis L, Vinyals O, Dai AM, Jozefowicz R, Bengio S. Generating sentences from a continuous space. In CoNLL 2016 - 20th SIGNLL Conference on Computational Natural Language Learning, Proceedings. Association for Computational Linguistics (ACL). 2016. p. 10-21. (CoNLL 2016 - 20th SIGNLL Conference on Computational Natural Language Learning, Proceedings).
    Bowman, Samuel ; Vilnis, Luke ; Vinyals, Oriol ; Dai, Andrew M. ; Jozefowicz, Rafal ; Bengio, Samy. / Generating sentences from a continuous space. CoNLL 2016 - 20th SIGNLL Conference on Computational Natural Language Learning, Proceedings. Association for Computational Linguistics (ACL), 2016. pp. 10-21 (CoNLL 2016 - 20th SIGNLL Conference on Computational Natural Language Learning, Proceedings).
    @inproceedings{6109fd634bfe44f6988b6d09fd99018c,
    title = "Generating sentences from a continuous space",
    abstract = "The standard recurrent neural network language model (rnnlm) generates sentences one word at a time and does not work from an explicit global sentence representation. In this work, we introduce and study an rnn-based variational autoencoder generative model that incorporates distributed latent representations of entire sentences. This factorization allows it to explicitly model holistic properties of sentences such as style, topic, and high-level syntactic features. Samples from the prior over these sentence representations remarkably produce diverse and well-formed sentences through simple deterministic decoding. By examining paths through this latent space, we are able to generate coherent novel sentences that interpolate between known sentences. We present techniques for solving the difficult learning problem presented by this model, demonstrate its effectiveness in imputing missing words, explore many interesting properties of the model’s latent sentence space, and present negative results on the use of the model in language modeling.",
    author = "Samuel Bowman and Luke Vilnis and Oriol Vinyals and Dai, {Andrew M.} and Rafal Jozefowicz and Samy Bengio",
    year = "2016",
    month = "1",
    day = "1",
    language = "English (US)",
    series = "CoNLL 2016 - 20th SIGNLL Conference on Computational Natural Language Learning, Proceedings",
    publisher = "Association for Computational Linguistics (ACL)",
    pages = "10--21",
    booktitle = "CoNLL 2016 - 20th SIGNLL Conference on Computational Natural Language Learning, Proceedings",

    }

    TY - GEN

    T1 - Generating sentences from a continuous space

    AU - Bowman, Samuel

    AU - Vilnis, Luke

    AU - Vinyals, Oriol

    AU - Dai, Andrew M.

    AU - Jozefowicz, Rafal

    AU - Bengio, Samy

    PY - 2016/1/1

    Y1 - 2016/1/1

    N2 - The standard recurrent neural network language model (rnnlm) generates sentences one word at a time and does not work from an explicit global sentence representation. In this work, we introduce and study an rnn-based variational autoencoder generative model that incorporates distributed latent representations of entire sentences. This factorization allows it to explicitly model holistic properties of sentences such as style, topic, and high-level syntactic features. Samples from the prior over these sentence representations remarkably produce diverse and well-formed sentences through simple deterministic decoding. By examining paths through this latent space, we are able to generate coherent novel sentences that interpolate between known sentences. We present techniques for solving the difficult learning problem presented by this model, demonstrate its effectiveness in imputing missing words, explore many interesting properties of the model’s latent sentence space, and present negative results on the use of the model in language modeling.

    AB - The standard recurrent neural network language model (rnnlm) generates sentences one word at a time and does not work from an explicit global sentence representation. In this work, we introduce and study an rnn-based variational autoencoder generative model that incorporates distributed latent representations of entire sentences. This factorization allows it to explicitly model holistic properties of sentences such as style, topic, and high-level syntactic features. Samples from the prior over these sentence representations remarkably produce diverse and well-formed sentences through simple deterministic decoding. By examining paths through this latent space, we are able to generate coherent novel sentences that interpolate between known sentences. We present techniques for solving the difficult learning problem presented by this model, demonstrate its effectiveness in imputing missing words, explore many interesting properties of the model’s latent sentence space, and present negative results on the use of the model in language modeling.

    UR - http://www.scopus.com/inward/record.url?scp=85072753030&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=85072753030&partnerID=8YFLogxK

    M3 - Conference contribution

    AN - SCOPUS:85072753030

    T3 - CoNLL 2016 - 20th SIGNLL Conference on Computational Natural Language Learning, Proceedings

    SP - 10

    EP - 21

    BT - CoNLL 2016 - 20th SIGNLL Conference on Computational Natural Language Learning, Proceedings

    PB - Association for Computational Linguistics (ACL)

    ER -