Explorations on high dimensional landscapes

Levent Sagun, V. Ugur Guney, Gerard Ben Arous, Yann LeCun

Research output: Contribution to journalArticle

Abstract

Finding minima of a real valued non-convex function over a high dimensional space is a major challenge in science. We provide evidence that some such functions that are defined on high dimensional domains have a narrow band of values whose pre-image contains the bulk of its critical points. This is in contrast with the low dimensional picture in which this band is wide. Our simulations agree with the previous theoretical work on spin glasses that proves the existence of such a band when the dimension of the domain tends to infinity. Furthermore our experiments on teacher-student networks with the MNIST dataset establish a similar phenomenon in deep networks. We finally observe that both the gradient descent and the stochastic gradient descent methods can reach this level within the same number of steps.
Original languageUndefined
JournalarXiv
StatePublished - Dec 20 2014

Keywords

  • stat.ML
  • cs.LG

Cite this

Explorations on high dimensional landscapes. / Sagun, Levent; Guney, V. Ugur; Arous, Gerard Ben; LeCun, Yann.

In: arXiv, 20.12.2014.

Research output: Contribution to journalArticle

@article{a3847d8dcc3247f08ad7f08c48dfcfce,
title = "Explorations on high dimensional landscapes",
abstract = "Finding minima of a real valued non-convex function over a high dimensional space is a major challenge in science. We provide evidence that some such functions that are defined on high dimensional domains have a narrow band of values whose pre-image contains the bulk of its critical points. This is in contrast with the low dimensional picture in which this band is wide. Our simulations agree with the previous theoretical work on spin glasses that proves the existence of such a band when the dimension of the domain tends to infinity. Furthermore our experiments on teacher-student networks with the MNIST dataset establish a similar phenomenon in deep networks. We finally observe that both the gradient descent and the stochastic gradient descent methods can reach this level within the same number of steps.",
keywords = "stat.ML, cs.LG",
author = "Levent Sagun and Guney, {V. Ugur} and Arous, {Gerard Ben} and Yann LeCun",
note = "11 pages, 8 figures, workshop contribution at ICLR 2015",
year = "2014",
month = "12",
day = "20",
language = "Undefined",
journal = "arXiv",

}

TY - JOUR

T1 - Explorations on high dimensional landscapes

AU - Sagun, Levent

AU - Guney, V. Ugur

AU - Arous, Gerard Ben

AU - LeCun, Yann

N1 - 11 pages, 8 figures, workshop contribution at ICLR 2015

PY - 2014/12/20

Y1 - 2014/12/20

N2 - Finding minima of a real valued non-convex function over a high dimensional space is a major challenge in science. We provide evidence that some such functions that are defined on high dimensional domains have a narrow band of values whose pre-image contains the bulk of its critical points. This is in contrast with the low dimensional picture in which this band is wide. Our simulations agree with the previous theoretical work on spin glasses that proves the existence of such a band when the dimension of the domain tends to infinity. Furthermore our experiments on teacher-student networks with the MNIST dataset establish a similar phenomenon in deep networks. We finally observe that both the gradient descent and the stochastic gradient descent methods can reach this level within the same number of steps.

AB - Finding minima of a real valued non-convex function over a high dimensional space is a major challenge in science. We provide evidence that some such functions that are defined on high dimensional domains have a narrow band of values whose pre-image contains the bulk of its critical points. This is in contrast with the low dimensional picture in which this band is wide. Our simulations agree with the previous theoretical work on spin glasses that proves the existence of such a band when the dimension of the domain tends to infinity. Furthermore our experiments on teacher-student networks with the MNIST dataset establish a similar phenomenon in deep networks. We finally observe that both the gradient descent and the stochastic gradient descent methods can reach this level within the same number of steps.

KW - stat.ML

KW - cs.LG

M3 - Article

JO - arXiv

JF - arXiv

ER -