Open Problem: The landscape of the loss surfaces of multilayer networks

Research output: Contribution to journalArticle

Abstract

Deep learning has enjoyed a resurgence of interest in the last few years for such applications as image and speech recognition, or natural language processing. The vast majority of practical applications of deep learning focus on supervised learning, where the supervised loss function is minimized using stochastic gradient descent. The properties of this highly non-convex loss function, such as its landscape and the behavior of critical points (maxima, minima, and saddle points), as well as the reason why large- and small-size networks achieve radically different practical performance, are however very poorly understood. It was only recently shown that new results in spin-glass theory potentially may provide an explanation for these problems by establishing a connection between the loss function of the neural networks and the Hamiltonian of the spherical spin-glass models. The connection between both models relies on a number of possibly unrealistic assumptions, yet the empirical evidence suggests that the connection may exist in real. The question we pose is whether it is possible to drop some of these assumptions to establish a stronger connection between both models.

Original languageEnglish (US)
JournalJournal of Machine Learning Research
Volume40
Issue number2015
StatePublished - 2015

Fingerprint

Loss Function
Multilayer
Open Problems
Multilayers
Spin glass
Spin Glasses (theory)
Stochastic Gradient
Hamiltonians
Image recognition
Image Recognition
Gradient Descent
Supervised learning
Spin Glass
Supervised Learning
Speech Recognition
Saddlepoint
Speech recognition
Natural Language
Critical point
Model

Keywords

  • Deep learning
  • Hamiltonian
  • Multilayer networks
  • Nonconvex optimization
  • Spherical spin-glass model

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Statistics and Probability
  • Artificial Intelligence

Cite this

@article{27b4481b58e04f508a8c8c9d5a26aced,
title = "Open Problem: The landscape of the loss surfaces of multilayer networks",
abstract = "Deep learning has enjoyed a resurgence of interest in the last few years for such applications as image and speech recognition, or natural language processing. The vast majority of practical applications of deep learning focus on supervised learning, where the supervised loss function is minimized using stochastic gradient descent. The properties of this highly non-convex loss function, such as its landscape and the behavior of critical points (maxima, minima, and saddle points), as well as the reason why large- and small-size networks achieve radically different practical performance, are however very poorly understood. It was only recently shown that new results in spin-glass theory potentially may provide an explanation for these problems by establishing a connection between the loss function of the neural networks and the Hamiltonian of the spherical spin-glass models. The connection between both models relies on a number of possibly unrealistic assumptions, yet the empirical evidence suggests that the connection may exist in real. The question we pose is whether it is possible to drop some of these assumptions to establish a stronger connection between both models.",
keywords = "Deep learning, Hamiltonian, Multilayer networks, Nonconvex optimization, Spherical spin-glass model",
author = "Anna Choromanska and Yann LeCun and {Ben Arous}, Gerard",
year = "2015",
language = "English (US)",
volume = "40",
journal = "Journal of Machine Learning Research",
issn = "1532-4435",
publisher = "Microtome Publishing",
number = "2015",

}

TY - JOUR

T1 - Open Problem

T2 - The landscape of the loss surfaces of multilayer networks

AU - Choromanska, Anna

AU - LeCun, Yann

AU - Ben Arous, Gerard

PY - 2015

Y1 - 2015

N2 - Deep learning has enjoyed a resurgence of interest in the last few years for such applications as image and speech recognition, or natural language processing. The vast majority of practical applications of deep learning focus on supervised learning, where the supervised loss function is minimized using stochastic gradient descent. The properties of this highly non-convex loss function, such as its landscape and the behavior of critical points (maxima, minima, and saddle points), as well as the reason why large- and small-size networks achieve radically different practical performance, are however very poorly understood. It was only recently shown that new results in spin-glass theory potentially may provide an explanation for these problems by establishing a connection between the loss function of the neural networks and the Hamiltonian of the spherical spin-glass models. The connection between both models relies on a number of possibly unrealistic assumptions, yet the empirical evidence suggests that the connection may exist in real. The question we pose is whether it is possible to drop some of these assumptions to establish a stronger connection between both models.

AB - Deep learning has enjoyed a resurgence of interest in the last few years for such applications as image and speech recognition, or natural language processing. The vast majority of practical applications of deep learning focus on supervised learning, where the supervised loss function is minimized using stochastic gradient descent. The properties of this highly non-convex loss function, such as its landscape and the behavior of critical points (maxima, minima, and saddle points), as well as the reason why large- and small-size networks achieve radically different practical performance, are however very poorly understood. It was only recently shown that new results in spin-glass theory potentially may provide an explanation for these problems by establishing a connection between the loss function of the neural networks and the Hamiltonian of the spherical spin-glass models. The connection between both models relies on a number of possibly unrealistic assumptions, yet the empirical evidence suggests that the connection may exist in real. The question we pose is whether it is possible to drop some of these assumptions to establish a stronger connection between both models.

KW - Deep learning

KW - Hamiltonian

KW - Multilayer networks

KW - Nonconvex optimization

KW - Spherical spin-glass model

UR - http://www.scopus.com/inward/record.url?scp=84984668609&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84984668609&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:84984668609

VL - 40

JO - Journal of Machine Learning Research

JF - Journal of Machine Learning Research

SN - 1532-4435

IS - 2015

ER -