Using curvature information to improve back-propagation

Research output: Contribution to journalArticle

Abstract

Among all the supervised learning algorithms, back-propagation (BP) is probably the most wi(l)dely used. Classical non-linear programming methods generally use an estimate of the Hessian matrix (matrix of second derivatives) to compute the weight modification at each iteration. They are derived from the well known Newton-Raphson algorithm. We propose a very rough approximation to the Newton method which just uses the diagonal terms of the Hessian matrix. These terms give information about the curvature of the error surface in directions parallel to the weight space axes. This information can be used to scale the learning rates for each weight independently. We show that it is possible to approximate the diagonal terms of the Hessian matrix using a back-propagation procedure very similar to the one used for the first derivatives.

Original languageEnglish (US)
Pages (from-to)168
Number of pages1
JournalNeural Networks
Volume1
Issue number1 SUPPL
DOIs
StatePublished - 1988

Fingerprint

Backpropagation
Weights and Measures
Learning
Derivatives
Supervised learning
Nonlinear programming
Newton-Raphson method
Learning algorithms

ASJC Scopus subject areas

  • Artificial Intelligence
  • Neuroscience(all)

Cite this

Using curvature information to improve back-propagation. / LeCun, Yann.

In: Neural Networks, Vol. 1, No. 1 SUPPL, 1988, p. 168.

Research output: Contribution to journalArticle

@article{700e47dd92214f0c9ffe7a35d3ea1f64,
title = "Using curvature information to improve back-propagation",
abstract = "Among all the supervised learning algorithms, back-propagation (BP) is probably the most wi(l)dely used. Classical non-linear programming methods generally use an estimate of the Hessian matrix (matrix of second derivatives) to compute the weight modification at each iteration. They are derived from the well known Newton-Raphson algorithm. We propose a very rough approximation to the Newton method which just uses the diagonal terms of the Hessian matrix. These terms give information about the curvature of the error surface in directions parallel to the weight space axes. This information can be used to scale the learning rates for each weight independently. We show that it is possible to approximate the diagonal terms of the Hessian matrix using a back-propagation procedure very similar to the one used for the first derivatives.",
author = "Yann LeCun",
year = "1988",
doi = "10.1016/0893-6080(88)90205-5",
language = "English (US)",
volume = "1",
pages = "168",
journal = "Neural Networks",
issn = "0893-6080",
publisher = "Elsevier Limited",
number = "1 SUPPL",

}

TY - JOUR

T1 - Using curvature information to improve back-propagation

AU - LeCun, Yann

PY - 1988

Y1 - 1988

N2 - Among all the supervised learning algorithms, back-propagation (BP) is probably the most wi(l)dely used. Classical non-linear programming methods generally use an estimate of the Hessian matrix (matrix of second derivatives) to compute the weight modification at each iteration. They are derived from the well known Newton-Raphson algorithm. We propose a very rough approximation to the Newton method which just uses the diagonal terms of the Hessian matrix. These terms give information about the curvature of the error surface in directions parallel to the weight space axes. This information can be used to scale the learning rates for each weight independently. We show that it is possible to approximate the diagonal terms of the Hessian matrix using a back-propagation procedure very similar to the one used for the first derivatives.

AB - Among all the supervised learning algorithms, back-propagation (BP) is probably the most wi(l)dely used. Classical non-linear programming methods generally use an estimate of the Hessian matrix (matrix of second derivatives) to compute the weight modification at each iteration. They are derived from the well known Newton-Raphson algorithm. We propose a very rough approximation to the Newton method which just uses the diagonal terms of the Hessian matrix. These terms give information about the curvature of the error surface in directions parallel to the weight space axes. This information can be used to scale the learning rates for each weight independently. We show that it is possible to approximate the diagonal terms of the Hessian matrix using a back-propagation procedure very similar to the one used for the first derivatives.

UR - http://www.scopus.com/inward/record.url?scp=0024168379&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0024168379&partnerID=8YFLogxK

U2 - 10.1016/0893-6080(88)90205-5

DO - 10.1016/0893-6080(88)90205-5

M3 - Article

AN - SCOPUS:0024168379

VL - 1

SP - 168

JO - Neural Networks

JF - Neural Networks

SN - 0893-6080

IS - 1 SUPPL

ER -