Unsupervised learning of sparse features for scalable audio classification

Mikael Henaff, Kevin Jarrett, Koray Kavukcuoglu, Yann LeCun

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this work we present a system to automatically learn features from audio in an unsupervised manner. Our method first learns an overcomplete dictionary which can be used to sparsely decompose log-scaled spectrograms. It then trains an efficient encoder which quickly maps new inputs to approximations of their sparse representations using the learned dictionary. This avoids expensive iterative procedures usually required to infer sparse codes. We then use these sparse codes as inputs for a linear Support Vector Machine (SVM). Our system achieves 83.4% accuracy in predicting genres on the GTZAN dataset, which is competitive with current state-of-the-art approaches. Furthermore, the use of a simple linear classifier combined with a fast feature extraction system allows our approach to scale well to large datasets.

Original languageEnglish (US)
Title of host publicationProceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR 2011
Pages681-686
Number of pages6
StatePublished - 2011
Event12th International Society for Music Information Retrieval Conference, ISMIR 2011 - Miami, FL, United States
Duration: Oct 24 2011Oct 28 2011

Other

Other12th International Society for Music Information Retrieval Conference, ISMIR 2011
CountryUnited States
CityMiami, FL
Period10/24/1110/28/11

Fingerprint

Unsupervised learning
Glossaries
Support vector machines
Feature extraction
Classifiers
Unsupervised Learning
Dictionary

ASJC Scopus subject areas

  • Music
  • Information Systems

Cite this

Henaff, M., Jarrett, K., Kavukcuoglu, K., & LeCun, Y. (2011). Unsupervised learning of sparse features for scalable audio classification. In Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR 2011 (pp. 681-686)

Unsupervised learning of sparse features for scalable audio classification. / Henaff, Mikael; Jarrett, Kevin; Kavukcuoglu, Koray; LeCun, Yann.

Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR 2011. 2011. p. 681-686.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Henaff, M, Jarrett, K, Kavukcuoglu, K & LeCun, Y 2011, Unsupervised learning of sparse features for scalable audio classification. in Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR 2011. pp. 681-686, 12th International Society for Music Information Retrieval Conference, ISMIR 2011, Miami, FL, United States, 10/24/11.
Henaff M, Jarrett K, Kavukcuoglu K, LeCun Y. Unsupervised learning of sparse features for scalable audio classification. In Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR 2011. 2011. p. 681-686
Henaff, Mikael ; Jarrett, Kevin ; Kavukcuoglu, Koray ; LeCun, Yann. / Unsupervised learning of sparse features for scalable audio classification. Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR 2011. 2011. pp. 681-686
@inproceedings{a20aac59492f4e2eb38f5cb3046e9185,
title = "Unsupervised learning of sparse features for scalable audio classification",
abstract = "In this work we present a system to automatically learn features from audio in an unsupervised manner. Our method first learns an overcomplete dictionary which can be used to sparsely decompose log-scaled spectrograms. It then trains an efficient encoder which quickly maps new inputs to approximations of their sparse representations using the learned dictionary. This avoids expensive iterative procedures usually required to infer sparse codes. We then use these sparse codes as inputs for a linear Support Vector Machine (SVM). Our system achieves 83.4{\%} accuracy in predicting genres on the GTZAN dataset, which is competitive with current state-of-the-art approaches. Furthermore, the use of a simple linear classifier combined with a fast feature extraction system allows our approach to scale well to large datasets.",
author = "Mikael Henaff and Kevin Jarrett and Koray Kavukcuoglu and Yann LeCun",
year = "2011",
language = "English (US)",
isbn = "9780615548654",
pages = "681--686",
booktitle = "Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR 2011",

}

TY - GEN

T1 - Unsupervised learning of sparse features for scalable audio classification

AU - Henaff, Mikael

AU - Jarrett, Kevin

AU - Kavukcuoglu, Koray

AU - LeCun, Yann

PY - 2011

Y1 - 2011

N2 - In this work we present a system to automatically learn features from audio in an unsupervised manner. Our method first learns an overcomplete dictionary which can be used to sparsely decompose log-scaled spectrograms. It then trains an efficient encoder which quickly maps new inputs to approximations of their sparse representations using the learned dictionary. This avoids expensive iterative procedures usually required to infer sparse codes. We then use these sparse codes as inputs for a linear Support Vector Machine (SVM). Our system achieves 83.4% accuracy in predicting genres on the GTZAN dataset, which is competitive with current state-of-the-art approaches. Furthermore, the use of a simple linear classifier combined with a fast feature extraction system allows our approach to scale well to large datasets.

AB - In this work we present a system to automatically learn features from audio in an unsupervised manner. Our method first learns an overcomplete dictionary which can be used to sparsely decompose log-scaled spectrograms. It then trains an efficient encoder which quickly maps new inputs to approximations of their sparse representations using the learned dictionary. This avoids expensive iterative procedures usually required to infer sparse codes. We then use these sparse codes as inputs for a linear Support Vector Machine (SVM). Our system achieves 83.4% accuracy in predicting genres on the GTZAN dataset, which is competitive with current state-of-the-art approaches. Furthermore, the use of a simple linear classifier combined with a fast feature extraction system allows our approach to scale well to large datasets.

UR - http://www.scopus.com/inward/record.url?scp=84864122549&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84864122549&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84864122549

SN - 9780615548654

SP - 681

EP - 686

BT - Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR 2011

ER -