On the equivalence between deep NADE and generative stochastic networks

Li Yao, Sherjil Ozair, Kyunghyun Cho, Yoshua Bengio

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Neural Autoregressive Distribution Estimators (NADEs) have recently been shown as successful alternatives for modeling high dimensional multimodal distributions. One issue associated with NADEs is that they rely on a particular order of factorization for P(x). This issue has been recently addressed by a variant of NADE called Orderless NADEs and its deeper version, Deep Orderless NADE. Orderless NADEs are trained based on a criterion that stochastically maximizes P(x) with all possible orders of factorizations. Unfortunately, ancestral sampling from deep NADE is very expensive, corresponding to running through a neural net separately predicting each of the visible variables given some others. This work makes a connection between this criterion and the training criterion for Generative Stochastic Networks (GSNs). It shows that training NADEs in this way also trains a GSN, which defines a Markov chain associated with the NADE model. Based on this connection, we show an alternative way to sample from a trained Orderless NADE that allows to trade-off computing time and quality of the samples: a 3 to 10-fold speedup (taking into account the waste due to correlations between consecutive samples of the chain) can be obtained without noticeably reducing the quality of the samples. This is achieved using a novel sampling procedure for GSNs called annealed GSN sampling, similar to tempering methods that combines fast mixing (obtained thanks to steps at high noise levels) with accurate samples (obtained thanks to steps at low noise levels).

Original languageEnglish (US)
Title of host publicationMachine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2014, Proceedings
PublisherSpringer Verlag
Pages322-336
Number of pages15
Volume8726 LNAI
EditionPART 3
ISBN (Print)9783662448441
DOIs
StatePublished - 2014
EventEuropean Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2014 - Nancy, France
Duration: Sep 15 2014Sep 19 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 3
Volume8726 LNAI
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

OtherEuropean Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2014
CountryFrance
CityNancy
Period9/15/149/19/14

Fingerprint

Stochastic Networks
Equivalence
Sampling
Factorization
Estimator
Tempering
Markov processes
Neural networks
Neural Nets
Alternatives
Consecutive
Markov chain
Speedup
Fold
High-dimensional
Trade-offs
Maximise

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Yao, L., Ozair, S., Cho, K., & Bengio, Y. (2014). On the equivalence between deep NADE and generative stochastic networks. In Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2014, Proceedings (PART 3 ed., Vol. 8726 LNAI, pp. 322-336). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8726 LNAI, No. PART 3). Springer Verlag. https://doi.org/10.1007/978-3-662-44845-8_21

On the equivalence between deep NADE and generative stochastic networks. / Yao, Li; Ozair, Sherjil; Cho, Kyunghyun; Bengio, Yoshua.

Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2014, Proceedings. Vol. 8726 LNAI PART 3. ed. Springer Verlag, 2014. p. 322-336 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8726 LNAI, No. PART 3).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yao, L, Ozair, S, Cho, K & Bengio, Y 2014, On the equivalence between deep NADE and generative stochastic networks. in Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2014, Proceedings. PART 3 edn, vol. 8726 LNAI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), no. PART 3, vol. 8726 LNAI, Springer Verlag, pp. 322-336, European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2014, Nancy, France, 9/15/14. https://doi.org/10.1007/978-3-662-44845-8_21
Yao L, Ozair S, Cho K, Bengio Y. On the equivalence between deep NADE and generative stochastic networks. In Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2014, Proceedings. PART 3 ed. Vol. 8726 LNAI. Springer Verlag. 2014. p. 322-336. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 3). https://doi.org/10.1007/978-3-662-44845-8_21
Yao, Li ; Ozair, Sherjil ; Cho, Kyunghyun ; Bengio, Yoshua. / On the equivalence between deep NADE and generative stochastic networks. Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2014, Proceedings. Vol. 8726 LNAI PART 3. ed. Springer Verlag, 2014. pp. 322-336 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 3).
@inproceedings{e28ae88f63f24414b68795c62dfb2f90,
title = "On the equivalence between deep NADE and generative stochastic networks",
abstract = "Neural Autoregressive Distribution Estimators (NADEs) have recently been shown as successful alternatives for modeling high dimensional multimodal distributions. One issue associated with NADEs is that they rely on a particular order of factorization for P(x). This issue has been recently addressed by a variant of NADE called Orderless NADEs and its deeper version, Deep Orderless NADE. Orderless NADEs are trained based on a criterion that stochastically maximizes P(x) with all possible orders of factorizations. Unfortunately, ancestral sampling from deep NADE is very expensive, corresponding to running through a neural net separately predicting each of the visible variables given some others. This work makes a connection between this criterion and the training criterion for Generative Stochastic Networks (GSNs). It shows that training NADEs in this way also trains a GSN, which defines a Markov chain associated with the NADE model. Based on this connection, we show an alternative way to sample from a trained Orderless NADE that allows to trade-off computing time and quality of the samples: a 3 to 10-fold speedup (taking into account the waste due to correlations between consecutive samples of the chain) can be obtained without noticeably reducing the quality of the samples. This is achieved using a novel sampling procedure for GSNs called annealed GSN sampling, similar to tempering methods that combines fast mixing (obtained thanks to steps at high noise levels) with accurate samples (obtained thanks to steps at low noise levels).",
author = "Li Yao and Sherjil Ozair and Kyunghyun Cho and Yoshua Bengio",
year = "2014",
doi = "10.1007/978-3-662-44845-8_21",
language = "English (US)",
isbn = "9783662448441",
volume = "8726 LNAI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
number = "PART 3",
pages = "322--336",
booktitle = "Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2014, Proceedings",
edition = "PART 3",

}

TY - GEN

T1 - On the equivalence between deep NADE and generative stochastic networks

AU - Yao, Li

AU - Ozair, Sherjil

AU - Cho, Kyunghyun

AU - Bengio, Yoshua

PY - 2014

Y1 - 2014

N2 - Neural Autoregressive Distribution Estimators (NADEs) have recently been shown as successful alternatives for modeling high dimensional multimodal distributions. One issue associated with NADEs is that they rely on a particular order of factorization for P(x). This issue has been recently addressed by a variant of NADE called Orderless NADEs and its deeper version, Deep Orderless NADE. Orderless NADEs are trained based on a criterion that stochastically maximizes P(x) with all possible orders of factorizations. Unfortunately, ancestral sampling from deep NADE is very expensive, corresponding to running through a neural net separately predicting each of the visible variables given some others. This work makes a connection between this criterion and the training criterion for Generative Stochastic Networks (GSNs). It shows that training NADEs in this way also trains a GSN, which defines a Markov chain associated with the NADE model. Based on this connection, we show an alternative way to sample from a trained Orderless NADE that allows to trade-off computing time and quality of the samples: a 3 to 10-fold speedup (taking into account the waste due to correlations between consecutive samples of the chain) can be obtained without noticeably reducing the quality of the samples. This is achieved using a novel sampling procedure for GSNs called annealed GSN sampling, similar to tempering methods that combines fast mixing (obtained thanks to steps at high noise levels) with accurate samples (obtained thanks to steps at low noise levels).

AB - Neural Autoregressive Distribution Estimators (NADEs) have recently been shown as successful alternatives for modeling high dimensional multimodal distributions. One issue associated with NADEs is that they rely on a particular order of factorization for P(x). This issue has been recently addressed by a variant of NADE called Orderless NADEs and its deeper version, Deep Orderless NADE. Orderless NADEs are trained based on a criterion that stochastically maximizes P(x) with all possible orders of factorizations. Unfortunately, ancestral sampling from deep NADE is very expensive, corresponding to running through a neural net separately predicting each of the visible variables given some others. This work makes a connection between this criterion and the training criterion for Generative Stochastic Networks (GSNs). It shows that training NADEs in this way also trains a GSN, which defines a Markov chain associated with the NADE model. Based on this connection, we show an alternative way to sample from a trained Orderless NADE that allows to trade-off computing time and quality of the samples: a 3 to 10-fold speedup (taking into account the waste due to correlations between consecutive samples of the chain) can be obtained without noticeably reducing the quality of the samples. This is achieved using a novel sampling procedure for GSNs called annealed GSN sampling, similar to tempering methods that combines fast mixing (obtained thanks to steps at high noise levels) with accurate samples (obtained thanks to steps at low noise levels).

UR - http://www.scopus.com/inward/record.url?scp=84907016932&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84907016932&partnerID=8YFLogxK

U2 - 10.1007/978-3-662-44845-8_21

DO - 10.1007/978-3-662-44845-8_21

M3 - Conference contribution

AN - SCOPUS:84907016932

SN - 9783662448441

VL - 8726 LNAI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 322

EP - 336

BT - Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2014, Proceedings

PB - Springer Verlag

ER -