How to pretrain deep Boltzmann machines in two stages

Kyunghyun Cho, Tapani Raiko, Alexander Ilin, Juha Karhunen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

A deep Boltzmann machine (DBM) is a recently introduced Markov random field model that has multiple layers of hidden units. It has been shown empirically that it is difficult to train a DBM with approximate maximum-likelihood learning using the stochastic gradient unlike its simpler special case, restricted Boltzmann machine (RBM). In this paper, we propose a novel pretraining algorithm that consists of two stages; obtaining approximate posterior distributions over hidden units from a simpler model and maximizing the variational lower-bound given the fixed hidden posterior distributions. We show empirically that the proposed method overcomes the difficulty in training DBMs from randomly initialized parameters and results in a better, or comparable, generative model when compared to the conventional pretraining algorithm.

Original languageEnglish (US)
Title of host publicationArtificial Neural Networks - Methods and Applications in Bio-/Neuroinformatics
PublisherSpringer Verlag
Pages201-219
Number of pages19
ISBN (Print)9783319099026
DOIs
StatePublished - 2015
Event23rd International Conference on Artificial Neural Networks, ICANN 2013 - Sofia, Bulgaria
Duration: Sep 10 2013Sep 13 2013

Other

Other23rd International Conference on Artificial Neural Networks, ICANN 2013
CountryBulgaria
CitySofia
Period9/10/139/13/13

Fingerprint

Maximum likelihood

ASJC Scopus subject areas

  • Artificial Intelligence
  • Information Systems

Cite this

Cho, K., Raiko, T., Ilin, A., & Karhunen, J. (2015). How to pretrain deep Boltzmann machines in two stages. In Artificial Neural Networks - Methods and Applications in Bio-/Neuroinformatics (pp. 201-219). Springer Verlag. https://doi.org/10.1007/978-3-319-09903-3_10

How to pretrain deep Boltzmann machines in two stages. / Cho, Kyunghyun; Raiko, Tapani; Ilin, Alexander; Karhunen, Juha.

Artificial Neural Networks - Methods and Applications in Bio-/Neuroinformatics. Springer Verlag, 2015. p. 201-219.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Cho, K, Raiko, T, Ilin, A & Karhunen, J 2015, How to pretrain deep Boltzmann machines in two stages. in Artificial Neural Networks - Methods and Applications in Bio-/Neuroinformatics. Springer Verlag, pp. 201-219, 23rd International Conference on Artificial Neural Networks, ICANN 2013, Sofia, Bulgaria, 9/10/13. https://doi.org/10.1007/978-3-319-09903-3_10
Cho K, Raiko T, Ilin A, Karhunen J. How to pretrain deep Boltzmann machines in two stages. In Artificial Neural Networks - Methods and Applications in Bio-/Neuroinformatics. Springer Verlag. 2015. p. 201-219 https://doi.org/10.1007/978-3-319-09903-3_10
Cho, Kyunghyun ; Raiko, Tapani ; Ilin, Alexander ; Karhunen, Juha. / How to pretrain deep Boltzmann machines in two stages. Artificial Neural Networks - Methods and Applications in Bio-/Neuroinformatics. Springer Verlag, 2015. pp. 201-219
@inproceedings{d0b019e95635414692366148cb7a2e90,
title = "How to pretrain deep Boltzmann machines in two stages",
abstract = "A deep Boltzmann machine (DBM) is a recently introduced Markov random field model that has multiple layers of hidden units. It has been shown empirically that it is difficult to train a DBM with approximate maximum-likelihood learning using the stochastic gradient unlike its simpler special case, restricted Boltzmann machine (RBM). In this paper, we propose a novel pretraining algorithm that consists of two stages; obtaining approximate posterior distributions over hidden units from a simpler model and maximizing the variational lower-bound given the fixed hidden posterior distributions. We show empirically that the proposed method overcomes the difficulty in training DBMs from randomly initialized parameters and results in a better, or comparable, generative model when compared to the conventional pretraining algorithm.",
author = "Kyunghyun Cho and Tapani Raiko and Alexander Ilin and Juha Karhunen",
year = "2015",
doi = "10.1007/978-3-319-09903-3_10",
language = "English (US)",
isbn = "9783319099026",
pages = "201--219",
booktitle = "Artificial Neural Networks - Methods and Applications in Bio-/Neuroinformatics",
publisher = "Springer Verlag",
address = "Germany",

}

TY - GEN

T1 - How to pretrain deep Boltzmann machines in two stages

AU - Cho, Kyunghyun

AU - Raiko, Tapani

AU - Ilin, Alexander

AU - Karhunen, Juha

PY - 2015

Y1 - 2015

N2 - A deep Boltzmann machine (DBM) is a recently introduced Markov random field model that has multiple layers of hidden units. It has been shown empirically that it is difficult to train a DBM with approximate maximum-likelihood learning using the stochastic gradient unlike its simpler special case, restricted Boltzmann machine (RBM). In this paper, we propose a novel pretraining algorithm that consists of two stages; obtaining approximate posterior distributions over hidden units from a simpler model and maximizing the variational lower-bound given the fixed hidden posterior distributions. We show empirically that the proposed method overcomes the difficulty in training DBMs from randomly initialized parameters and results in a better, or comparable, generative model when compared to the conventional pretraining algorithm.

AB - A deep Boltzmann machine (DBM) is a recently introduced Markov random field model that has multiple layers of hidden units. It has been shown empirically that it is difficult to train a DBM with approximate maximum-likelihood learning using the stochastic gradient unlike its simpler special case, restricted Boltzmann machine (RBM). In this paper, we propose a novel pretraining algorithm that consists of two stages; obtaining approximate posterior distributions over hidden units from a simpler model and maximizing the variational lower-bound given the fixed hidden posterior distributions. We show empirically that the proposed method overcomes the difficulty in training DBMs from randomly initialized parameters and results in a better, or comparable, generative model when compared to the conventional pretraining algorithm.

UR - http://www.scopus.com/inward/record.url?scp=85008402581&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85008402581&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-09903-3_10

DO - 10.1007/978-3-319-09903-3_10

M3 - Conference contribution

AN - SCOPUS:85008402581

SN - 9783319099026

SP - 201

EP - 219

BT - Artificial Neural Networks - Methods and Applications in Bio-/Neuroinformatics

PB - Springer Verlag

ER -