Audio source separation with discriminative scattering networks

Pablo Sprechmann, Joan Bruna Estrach, Yann LeCun

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Many monaural signal decomposition techniques proposed in the literature operate on a feature space consisting of a time-frequency representation of the input data. A challenge faced by these approaches is to effectively exploit the temporal dependencies of the signals at scales larger than the duration of a time-frame. In this work we propose to tackle this problem by modeling the signals using a time-frequency representation with multiple temporal resolutions. For this reason we use a signal representation that consists of a pyramid of wavelet scattering operators, which generalizes Constant Q Transforms (CQT) with extra layers of convolution and complex modulus. We first show that learning standard models with this multi-resolution setting improves source separation results over fixed-resolution methods. As study case, we use Non-Negative Matrix Factorizations (NMF) that has been widely considered in many audio application. Then, we investigate the inclusion of the proposed multi-resolution setting into a discriminative training regime. We discuss several alternatives using different deep neural network architectures, and our preliminary experiments suggest that in this task, finite impulse, multi-resolution Convolutional Networks are a competitive baseline compared to recurrent alternatives.

Original languageEnglish (US)
Title of host publicationLatent Variable Analysis and Signal Separation - 12th International Conference, LVA/ICA 2015, Proceedings
PublisherSpringer Verlag
Pages259-267
Number of pages9
Volume9237
ISBN (Print)9783319224817
DOIs
StatePublished - 2015
Event12th International Conference on Latent Variable Analysis and Signal Separation, LVA/ICA 2015 - Liberec, Czech Republic
Duration: Aug 25 2015Aug 28 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9237
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other12th International Conference on Latent Variable Analysis and Signal Separation, LVA/ICA 2015
CountryCzech Republic
CityLiberec
Period8/25/158/28/15

Fingerprint

Source separation
Source Separation
Network architecture
Factorization
Convolution
Mathematical operators
Multiresolution
Scattering
Mathematical transformations
Decomposition
Experiments
Discriminative Training
Scattering Operator
Non-negative Matrix Factorization
Alternatives
Decomposition Techniques
Pyramid
Network Architecture
Use Case
Feature Space

Keywords

  • Deep learning
  • Non-negative matrix factorization
  • Scattering
  • Source separation

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Sprechmann, P., Bruna Estrach, J., & LeCun, Y. (2015). Audio source separation with discriminative scattering networks. In Latent Variable Analysis and Signal Separation - 12th International Conference, LVA/ICA 2015, Proceedings (Vol. 9237, pp. 259-267). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9237). Springer Verlag. https://doi.org/10.1007/978-3-319-22482-4_30

Audio source separation with discriminative scattering networks. / Sprechmann, Pablo; Bruna Estrach, Joan; LeCun, Yann.

Latent Variable Analysis and Signal Separation - 12th International Conference, LVA/ICA 2015, Proceedings. Vol. 9237 Springer Verlag, 2015. p. 259-267 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9237).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Sprechmann, P, Bruna Estrach, J & LeCun, Y 2015, Audio source separation with discriminative scattering networks. in Latent Variable Analysis and Signal Separation - 12th International Conference, LVA/ICA 2015, Proceedings. vol. 9237, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9237, Springer Verlag, pp. 259-267, 12th International Conference on Latent Variable Analysis and Signal Separation, LVA/ICA 2015, Liberec, Czech Republic, 8/25/15. https://doi.org/10.1007/978-3-319-22482-4_30
Sprechmann P, Bruna Estrach J, LeCun Y. Audio source separation with discriminative scattering networks. In Latent Variable Analysis and Signal Separation - 12th International Conference, LVA/ICA 2015, Proceedings. Vol. 9237. Springer Verlag. 2015. p. 259-267. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-22482-4_30
Sprechmann, Pablo ; Bruna Estrach, Joan ; LeCun, Yann. / Audio source separation with discriminative scattering networks. Latent Variable Analysis and Signal Separation - 12th International Conference, LVA/ICA 2015, Proceedings. Vol. 9237 Springer Verlag, 2015. pp. 259-267 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{e4ede498a8c844ff929fc4882adbe164,
title = "Audio source separation with discriminative scattering networks",
abstract = "Many monaural signal decomposition techniques proposed in the literature operate on a feature space consisting of a time-frequency representation of the input data. A challenge faced by these approaches is to effectively exploit the temporal dependencies of the signals at scales larger than the duration of a time-frame. In this work we propose to tackle this problem by modeling the signals using a time-frequency representation with multiple temporal resolutions. For this reason we use a signal representation that consists of a pyramid of wavelet scattering operators, which generalizes Constant Q Transforms (CQT) with extra layers of convolution and complex modulus. We first show that learning standard models with this multi-resolution setting improves source separation results over fixed-resolution methods. As study case, we use Non-Negative Matrix Factorizations (NMF) that has been widely considered in many audio application. Then, we investigate the inclusion of the proposed multi-resolution setting into a discriminative training regime. We discuss several alternatives using different deep neural network architectures, and our preliminary experiments suggest that in this task, finite impulse, multi-resolution Convolutional Networks are a competitive baseline compared to recurrent alternatives.",
keywords = "Deep learning, Non-negative matrix factorization, Scattering, Source separation",
author = "Pablo Sprechmann and {Bruna Estrach}, Joan and Yann LeCun",
year = "2015",
doi = "10.1007/978-3-319-22482-4_30",
language = "English (US)",
isbn = "9783319224817",
volume = "9237",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "259--267",
booktitle = "Latent Variable Analysis and Signal Separation - 12th International Conference, LVA/ICA 2015, Proceedings",

}

TY - GEN

T1 - Audio source separation with discriminative scattering networks

AU - Sprechmann, Pablo

AU - Bruna Estrach, Joan

AU - LeCun, Yann

PY - 2015

Y1 - 2015

N2 - Many monaural signal decomposition techniques proposed in the literature operate on a feature space consisting of a time-frequency representation of the input data. A challenge faced by these approaches is to effectively exploit the temporal dependencies of the signals at scales larger than the duration of a time-frame. In this work we propose to tackle this problem by modeling the signals using a time-frequency representation with multiple temporal resolutions. For this reason we use a signal representation that consists of a pyramid of wavelet scattering operators, which generalizes Constant Q Transforms (CQT) with extra layers of convolution and complex modulus. We first show that learning standard models with this multi-resolution setting improves source separation results over fixed-resolution methods. As study case, we use Non-Negative Matrix Factorizations (NMF) that has been widely considered in many audio application. Then, we investigate the inclusion of the proposed multi-resolution setting into a discriminative training regime. We discuss several alternatives using different deep neural network architectures, and our preliminary experiments suggest that in this task, finite impulse, multi-resolution Convolutional Networks are a competitive baseline compared to recurrent alternatives.

AB - Many monaural signal decomposition techniques proposed in the literature operate on a feature space consisting of a time-frequency representation of the input data. A challenge faced by these approaches is to effectively exploit the temporal dependencies of the signals at scales larger than the duration of a time-frame. In this work we propose to tackle this problem by modeling the signals using a time-frequency representation with multiple temporal resolutions. For this reason we use a signal representation that consists of a pyramid of wavelet scattering operators, which generalizes Constant Q Transforms (CQT) with extra layers of convolution and complex modulus. We first show that learning standard models with this multi-resolution setting improves source separation results over fixed-resolution methods. As study case, we use Non-Negative Matrix Factorizations (NMF) that has been widely considered in many audio application. Then, we investigate the inclusion of the proposed multi-resolution setting into a discriminative training regime. We discuss several alternatives using different deep neural network architectures, and our preliminary experiments suggest that in this task, finite impulse, multi-resolution Convolutional Networks are a competitive baseline compared to recurrent alternatives.

KW - Deep learning

KW - Non-negative matrix factorization

KW - Scattering

KW - Source separation

UR - http://www.scopus.com/inward/record.url?scp=84944679866&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84944679866&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-22482-4_30

DO - 10.1007/978-3-319-22482-4_30

M3 - Conference contribution

SN - 9783319224817

VL - 9237

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 259

EP - 267

BT - Latent Variable Analysis and Signal Separation - 12th International Conference, LVA/ICA 2015, Proceedings

PB - Springer Verlag

ER -