### Abstract

Among all the supervised learning algorithms, back-propagation (BP) is probably the most wi(l)dely used. Classical non-linear programming methods generally use an estimate of the Hessian matrix (matrix of second derivatives) to compute the weight modification at each iteration. They are derived from the well known Newton-Raphson algorithm. We propose a very rough approximation to the Newton method which just uses the diagonal terms of the Hessian matrix. These terms give information about the curvature of the error surface in directions parallel to the weight space axes. This information can be used to scale the learning rates for each weight independently. We show that it is possible to approximate the diagonal terms of the Hessian matrix using a back-propagation procedure very similar to the one used for the first derivatives.

Original language | English (US) |
---|---|

Pages (from-to) | 168 |

Number of pages | 1 |

Journal | Neural Networks |

Volume | 1 |

Issue number | 1 SUPPL |

DOIs | |

State | Published - 1988 |

### Fingerprint

### ASJC Scopus subject areas

- Artificial Intelligence
- Neuroscience(all)

### Cite this

**Using curvature information to improve back-propagation.** / LeCun, Yann.

Research output: Contribution to journal › Article

*Neural Networks*, vol. 1, no. 1 SUPPL, pp. 168. https://doi.org/10.1016/0893-6080(88)90205-5

}

TY - JOUR

T1 - Using curvature information to improve back-propagation

AU - LeCun, Yann

PY - 1988

Y1 - 1988

N2 - Among all the supervised learning algorithms, back-propagation (BP) is probably the most wi(l)dely used. Classical non-linear programming methods generally use an estimate of the Hessian matrix (matrix of second derivatives) to compute the weight modification at each iteration. They are derived from the well known Newton-Raphson algorithm. We propose a very rough approximation to the Newton method which just uses the diagonal terms of the Hessian matrix. These terms give information about the curvature of the error surface in directions parallel to the weight space axes. This information can be used to scale the learning rates for each weight independently. We show that it is possible to approximate the diagonal terms of the Hessian matrix using a back-propagation procedure very similar to the one used for the first derivatives.

AB - Among all the supervised learning algorithms, back-propagation (BP) is probably the most wi(l)dely used. Classical non-linear programming methods generally use an estimate of the Hessian matrix (matrix of second derivatives) to compute the weight modification at each iteration. They are derived from the well known Newton-Raphson algorithm. We propose a very rough approximation to the Newton method which just uses the diagonal terms of the Hessian matrix. These terms give information about the curvature of the error surface in directions parallel to the weight space axes. This information can be used to scale the learning rates for each weight independently. We show that it is possible to approximate the diagonal terms of the Hessian matrix using a back-propagation procedure very similar to the one used for the first derivatives.

UR - http://www.scopus.com/inward/record.url?scp=0024168379&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0024168379&partnerID=8YFLogxK

U2 - 10.1016/0893-6080(88)90205-5

DO - 10.1016/0893-6080(88)90205-5

M3 - Article

AN - SCOPUS:0024168379

VL - 1

SP - 168

JO - Neural Networks

JF - Neural Networks

SN - 0893-6080

IS - 1 SUPPL

ER -