How to construct deep recurrent neural networks: Proceedings of the Second International Conference on Learning Representations (ICLR 2014)

Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Yoshua Bengio

Research output: Contribution to conferencePaper

Abstract

In this paper, we explore different ways to extend a recurrent neural network (RNN) to a deep RNN. We start by arguing that the concept of depth in an RNN is not as clear as it is in feedforward neural networks. By carefully analyzing and understanding the architecture of an RNN, however, we find three points of an RNN which may be made deeper; (1) input-to-hidden function, (2) hidden-to-hidden transition and (3) hidden-to-output function. Based on this observation, we propose two novel architectures of a deep RNN which are orthogonal to an earlier attempt of stacking multiple recurrent layers to build a deep RNN (Schmidhuber, 1992; El Hihi and Bengio, 1996). We provide an alternative interpretation of these deep RNNs using a novel framework based on neural operators. The proposed deep RNNs are empirically evaluated on the tasks of polyphonic music prediction and language modeling. The experimental result supports our claim that the proposed deep RNNs benefit from the depth and outperform the conventional, shallow RNNs.

Original languageEnglish (US)
StatePublished - Jan 1 2014
Event2nd International Conference on Learning Representations, ICLR 2014 - Banff, Canada
Duration: Apr 14 2014Apr 16 2014

Conference

Conference2nd International Conference on Learning Representations, ICLR 2014
CountryCanada
CityBanff
Period4/14/144/16/14

Fingerprint

Second International
Recurrent neural networks
neural network
learning
Feedforward neural networks
Proceedings
Recurrent Neural Networks
music
interpretation

ASJC Scopus subject areas

  • Linguistics and Language
  • Language and Linguistics
  • Education
  • Computer Science Applications

Cite this

Pascanu, R., Gulcehre, C., Cho, K., & Bengio, Y. (2014). How to construct deep recurrent neural networks: Proceedings of the Second International Conference on Learning Representations (ICLR 2014). Paper presented at 2nd International Conference on Learning Representations, ICLR 2014, Banff, Canada.

How to construct deep recurrent neural networks : Proceedings of the Second International Conference on Learning Representations (ICLR 2014). / Pascanu, Razvan; Gulcehre, Caglar; Cho, Kyunghyun; Bengio, Yoshua.

2014. Paper presented at 2nd International Conference on Learning Representations, ICLR 2014, Banff, Canada.

Research output: Contribution to conferencePaper

Pascanu, R, Gulcehre, C, Cho, K & Bengio, Y 2014, 'How to construct deep recurrent neural networks: Proceedings of the Second International Conference on Learning Representations (ICLR 2014)', Paper presented at 2nd International Conference on Learning Representations, ICLR 2014, Banff, Canada, 4/14/14 - 4/16/14.
Pascanu R, Gulcehre C, Cho K, Bengio Y. How to construct deep recurrent neural networks: Proceedings of the Second International Conference on Learning Representations (ICLR 2014). 2014. Paper presented at 2nd International Conference on Learning Representations, ICLR 2014, Banff, Canada.
Pascanu, Razvan ; Gulcehre, Caglar ; Cho, Kyunghyun ; Bengio, Yoshua. / How to construct deep recurrent neural networks : Proceedings of the Second International Conference on Learning Representations (ICLR 2014). Paper presented at 2nd International Conference on Learning Representations, ICLR 2014, Banff, Canada.
@conference{e484d409b39e4c65ba9ffe35a6eb629c,
title = "How to construct deep recurrent neural networks: Proceedings of the Second International Conference on Learning Representations (ICLR 2014)",
abstract = "In this paper, we explore different ways to extend a recurrent neural network (RNN) to a deep RNN. We start by arguing that the concept of depth in an RNN is not as clear as it is in feedforward neural networks. By carefully analyzing and understanding the architecture of an RNN, however, we find three points of an RNN which may be made deeper; (1) input-to-hidden function, (2) hidden-to-hidden transition and (3) hidden-to-output function. Based on this observation, we propose two novel architectures of a deep RNN which are orthogonal to an earlier attempt of stacking multiple recurrent layers to build a deep RNN (Schmidhuber, 1992; El Hihi and Bengio, 1996). We provide an alternative interpretation of these deep RNNs using a novel framework based on neural operators. The proposed deep RNNs are empirically evaluated on the tasks of polyphonic music prediction and language modeling. The experimental result supports our claim that the proposed deep RNNs benefit from the depth and outperform the conventional, shallow RNNs.",
author = "Razvan Pascanu and Caglar Gulcehre and Kyunghyun Cho and Yoshua Bengio",
year = "2014",
month = "1",
day = "1",
language = "English (US)",
note = "2nd International Conference on Learning Representations, ICLR 2014 ; Conference date: 14-04-2014 Through 16-04-2014",

}

TY - CONF

T1 - How to construct deep recurrent neural networks

T2 - Proceedings of the Second International Conference on Learning Representations (ICLR 2014)

AU - Pascanu, Razvan

AU - Gulcehre, Caglar

AU - Cho, Kyunghyun

AU - Bengio, Yoshua

PY - 2014/1/1

Y1 - 2014/1/1

N2 - In this paper, we explore different ways to extend a recurrent neural network (RNN) to a deep RNN. We start by arguing that the concept of depth in an RNN is not as clear as it is in feedforward neural networks. By carefully analyzing and understanding the architecture of an RNN, however, we find three points of an RNN which may be made deeper; (1) input-to-hidden function, (2) hidden-to-hidden transition and (3) hidden-to-output function. Based on this observation, we propose two novel architectures of a deep RNN which are orthogonal to an earlier attempt of stacking multiple recurrent layers to build a deep RNN (Schmidhuber, 1992; El Hihi and Bengio, 1996). We provide an alternative interpretation of these deep RNNs using a novel framework based on neural operators. The proposed deep RNNs are empirically evaluated on the tasks of polyphonic music prediction and language modeling. The experimental result supports our claim that the proposed deep RNNs benefit from the depth and outperform the conventional, shallow RNNs.

AB - In this paper, we explore different ways to extend a recurrent neural network (RNN) to a deep RNN. We start by arguing that the concept of depth in an RNN is not as clear as it is in feedforward neural networks. By carefully analyzing and understanding the architecture of an RNN, however, we find three points of an RNN which may be made deeper; (1) input-to-hidden function, (2) hidden-to-hidden transition and (3) hidden-to-output function. Based on this observation, we propose two novel architectures of a deep RNN which are orthogonal to an earlier attempt of stacking multiple recurrent layers to build a deep RNN (Schmidhuber, 1992; El Hihi and Bengio, 1996). We provide an alternative interpretation of these deep RNNs using a novel framework based on neural operators. The proposed deep RNNs are empirically evaluated on the tasks of polyphonic music prediction and language modeling. The experimental result supports our claim that the proposed deep RNNs benefit from the depth and outperform the conventional, shallow RNNs.

UR - http://www.scopus.com/inward/record.url?scp=85070857075&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85070857075&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:85070857075

ER -