Abstract
In this work we present a system to automatically learn features from audio in an unsupervised manner. Our method first learns an overcomplete dictionary which can be used to sparsely decompose log-scaled spectrograms. It then trains an efficient encoder which quickly maps new inputs to approximations of their sparse representations using the learned dictionary. This avoids expensive iterative procedures usually required to infer sparse codes. We then use these sparse codes as inputs for a linear Support Vector Machine (SVM). Our system achieves 83.4% accuracy in predicting genres on the GTZAN dataset, which is competitive with current state-of-the-art approaches. Furthermore, the use of a simple linear classifier combined with a fast feature extraction system allows our approach to scale well to large datasets.
Original language | English (US) |
---|---|
Title of host publication | Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR 2011 |
Pages | 681-686 |
Number of pages | 6 |
State | Published - 2011 |
Event | 12th International Society for Music Information Retrieval Conference, ISMIR 2011 - Miami, FL, United States Duration: Oct 24 2011 → Oct 28 2011 |
Other
Other | 12th International Society for Music Information Retrieval Conference, ISMIR 2011 |
---|---|
Country | United States |
City | Miami, FL |
Period | 10/24/11 → 10/28/11 |
Fingerprint
ASJC Scopus subject areas
- Music
- Information Systems
Cite this
Unsupervised learning of sparse features for scalable audio classification. / Henaff, Mikael; Jarrett, Kevin; Kavukcuoglu, Koray; LeCun, Yann.
Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR 2011. 2011. p. 681-686.Research output: Chapter in Book/Report/Conference proceeding › Conference contribution
}
TY - GEN
T1 - Unsupervised learning of sparse features for scalable audio classification
AU - Henaff, Mikael
AU - Jarrett, Kevin
AU - Kavukcuoglu, Koray
AU - LeCun, Yann
PY - 2011
Y1 - 2011
N2 - In this work we present a system to automatically learn features from audio in an unsupervised manner. Our method first learns an overcomplete dictionary which can be used to sparsely decompose log-scaled spectrograms. It then trains an efficient encoder which quickly maps new inputs to approximations of their sparse representations using the learned dictionary. This avoids expensive iterative procedures usually required to infer sparse codes. We then use these sparse codes as inputs for a linear Support Vector Machine (SVM). Our system achieves 83.4% accuracy in predicting genres on the GTZAN dataset, which is competitive with current state-of-the-art approaches. Furthermore, the use of a simple linear classifier combined with a fast feature extraction system allows our approach to scale well to large datasets.
AB - In this work we present a system to automatically learn features from audio in an unsupervised manner. Our method first learns an overcomplete dictionary which can be used to sparsely decompose log-scaled spectrograms. It then trains an efficient encoder which quickly maps new inputs to approximations of their sparse representations using the learned dictionary. This avoids expensive iterative procedures usually required to infer sparse codes. We then use these sparse codes as inputs for a linear Support Vector Machine (SVM). Our system achieves 83.4% accuracy in predicting genres on the GTZAN dataset, which is competitive with current state-of-the-art approaches. Furthermore, the use of a simple linear classifier combined with a fast feature extraction system allows our approach to scale well to large datasets.
UR - http://www.scopus.com/inward/record.url?scp=84864122549&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84864122549&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84864122549
SN - 9780615548654
SP - 681
EP - 686
BT - Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR 2011
ER -