Understanding trainable sparse coding via matrix factorization

Thomas Moreau, Joan Bruna Estrach

Research output: Contribution to conferencePaper

Abstract

Sparse coding is a core building block in many data analysis and machine learning pipelines. Typically it is solved by relying on generic optimization techniques, such as the Iterative Soft Thresholding Algorithm and its accelerated version (ISTA, FISTA). These methods are optimal in the class of first-order methods for non-smooth, convex functions. However, they do not exploit the particular structure of the problem at hand nor the input data distribution. An acceleration using neural networks, coined LISTA, was proposed in Gregor & Le Cun (2010), which showed empirically that one could achieve high quality estimates with few iterations by modifying the parameters of the proximal splitting appropriately. In this paper we study the reasons for such acceleration. Our mathematical analysis reveals that it is related to a specific matrix factorization of the Gram kernel of the dictionary, which attempts to nearly diagonalise the kernel with a basis that produces a small perturbation of the `1 ball. When this factorization succeeds, we prove that the resulting splitting algorithm enjoys an improved convergence bound with respect to the non-adaptive version. Moreover, our analysis also shows that conditions for acceleration occur mostly at the beginning of the iterative process, consistent with numerical experiments. We further validate our analysis by showing that on dictionaries where this factorization does not exist, adaptive acceleration fails.

Original languageEnglish (US)
StatePublished - Jan 1 2019
Event5th International Conference on Learning Representations, ICLR 2017 - Toulon, France
Duration: Apr 24 2017Apr 26 2017

Conference

Conference5th International Conference on Learning Representations, ICLR 2017
CountryFrance
CityToulon
Period4/24/174/26/17

Fingerprint

Factorization
coding
Glossaries
dictionary
neural network
Learning systems
data analysis
Pipelines
Neural networks
experiment
learning
Experiments
Kernel
Dictionary

ASJC Scopus subject areas

  • Education
  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Cite this

Moreau, T., & Bruna Estrach, J. (2019). Understanding trainable sparse coding via matrix factorization. Paper presented at 5th International Conference on Learning Representations, ICLR 2017, Toulon, France.

Understanding trainable sparse coding via matrix factorization. / Moreau, Thomas; Bruna Estrach, Joan.

2019. Paper presented at 5th International Conference on Learning Representations, ICLR 2017, Toulon, France.

Research output: Contribution to conferencePaper

Moreau, T & Bruna Estrach, J 2019, 'Understanding trainable sparse coding via matrix factorization', Paper presented at 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 4/24/17 - 4/26/17.
Moreau T, Bruna Estrach J. Understanding trainable sparse coding via matrix factorization. 2019. Paper presented at 5th International Conference on Learning Representations, ICLR 2017, Toulon, France.
Moreau, Thomas ; Bruna Estrach, Joan. / Understanding trainable sparse coding via matrix factorization. Paper presented at 5th International Conference on Learning Representations, ICLR 2017, Toulon, France.
@conference{b58e4ba320ad41bfa7c9603851819070,
title = "Understanding trainable sparse coding via matrix factorization",
abstract = "Sparse coding is a core building block in many data analysis and machine learning pipelines. Typically it is solved by relying on generic optimization techniques, such as the Iterative Soft Thresholding Algorithm and its accelerated version (ISTA, FISTA). These methods are optimal in the class of first-order methods for non-smooth, convex functions. However, they do not exploit the particular structure of the problem at hand nor the input data distribution. An acceleration using neural networks, coined LISTA, was proposed in Gregor & Le Cun (2010), which showed empirically that one could achieve high quality estimates with few iterations by modifying the parameters of the proximal splitting appropriately. In this paper we study the reasons for such acceleration. Our mathematical analysis reveals that it is related to a specific matrix factorization of the Gram kernel of the dictionary, which attempts to nearly diagonalise the kernel with a basis that produces a small perturbation of the `1 ball. When this factorization succeeds, we prove that the resulting splitting algorithm enjoys an improved convergence bound with respect to the non-adaptive version. Moreover, our analysis also shows that conditions for acceleration occur mostly at the beginning of the iterative process, consistent with numerical experiments. We further validate our analysis by showing that on dictionaries where this factorization does not exist, adaptive acceleration fails.",
author = "Thomas Moreau and {Bruna Estrach}, Joan",
year = "2019",
month = "1",
day = "1",
language = "English (US)",
note = "5th International Conference on Learning Representations, ICLR 2017 ; Conference date: 24-04-2017 Through 26-04-2017",

}

TY - CONF

T1 - Understanding trainable sparse coding via matrix factorization

AU - Moreau, Thomas

AU - Bruna Estrach, Joan

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Sparse coding is a core building block in many data analysis and machine learning pipelines. Typically it is solved by relying on generic optimization techniques, such as the Iterative Soft Thresholding Algorithm and its accelerated version (ISTA, FISTA). These methods are optimal in the class of first-order methods for non-smooth, convex functions. However, they do not exploit the particular structure of the problem at hand nor the input data distribution. An acceleration using neural networks, coined LISTA, was proposed in Gregor & Le Cun (2010), which showed empirically that one could achieve high quality estimates with few iterations by modifying the parameters of the proximal splitting appropriately. In this paper we study the reasons for such acceleration. Our mathematical analysis reveals that it is related to a specific matrix factorization of the Gram kernel of the dictionary, which attempts to nearly diagonalise the kernel with a basis that produces a small perturbation of the `1 ball. When this factorization succeeds, we prove that the resulting splitting algorithm enjoys an improved convergence bound with respect to the non-adaptive version. Moreover, our analysis also shows that conditions for acceleration occur mostly at the beginning of the iterative process, consistent with numerical experiments. We further validate our analysis by showing that on dictionaries where this factorization does not exist, adaptive acceleration fails.

AB - Sparse coding is a core building block in many data analysis and machine learning pipelines. Typically it is solved by relying on generic optimization techniques, such as the Iterative Soft Thresholding Algorithm and its accelerated version (ISTA, FISTA). These methods are optimal in the class of first-order methods for non-smooth, convex functions. However, they do not exploit the particular structure of the problem at hand nor the input data distribution. An acceleration using neural networks, coined LISTA, was proposed in Gregor & Le Cun (2010), which showed empirically that one could achieve high quality estimates with few iterations by modifying the parameters of the proximal splitting appropriately. In this paper we study the reasons for such acceleration. Our mathematical analysis reveals that it is related to a specific matrix factorization of the Gram kernel of the dictionary, which attempts to nearly diagonalise the kernel with a basis that produces a small perturbation of the `1 ball. When this factorization succeeds, we prove that the resulting splitting algorithm enjoys an improved convergence bound with respect to the non-adaptive version. Moreover, our analysis also shows that conditions for acceleration occur mostly at the beginning of the iterative process, consistent with numerical experiments. We further validate our analysis by showing that on dictionaries where this factorization does not exist, adaptive acceleration fails.

UR - http://www.scopus.com/inward/record.url?scp=85070994562&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85070994562&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:85070994562

ER -