Data driven and discriminative projections for large-scale cover song identification

Eric J. Humphrey, Oriol Nieto, Juan Bello

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The predominant approach to computing document similarity in web scale applications proceeds by encoding task-specific invariance in a vectorized representation, such that the relationship between items can be computed efficiently by a simple scoring function, e.g. Euclidean distance. Here, we improve upon previous work in large-scale cover song identification by using data-driven projections at different time-scales to capture local features and embed summary vectors into a semantically organized space. We achieve this by projecting 2D-Fourier Magnitude Coefficients (2D-FMCs) of beat-chroma patches into a sparse, high dimensional representation which, due to the shift invariance properties of the Fourier Transform, is similar in principle to convolutional sparse coding. After aggregating these local beat-chroma projections, we apply supervised dimensionality reduction to recover an embedding where distance is useful for cover song retrieval. Evaluating on the Million Song Dataset, we find our method outperforms the current state of the art overall, but significantly so for top-k metrics, which indicate improved usability.

Original languageEnglish (US)
Title of host publicationProceedings of the 14th International Society for Music Information Retrieval Conference, ISMIR 2013
EditorsAlceu de Souza Britto, Fabien Gouyon, Simon Dixon
PublisherInternational Society for Music Information Retrieval
Pages149-154
Number of pages6
ISBN (Electronic)9780615900650
StatePublished - Jan 1 2013
Event14th International Society for Music Information Retrieval Conference, ISMIR 2013 - Curitiba, Brazil
Duration: Nov 4 2013Nov 8 2013

Publication series

NameProceedings of the 14th International Society for Music Information Retrieval Conference, ISMIR 2013

Conference

Conference14th International Society for Music Information Retrieval Conference, ISMIR 2013
CountryBrazil
CityCuritiba
Period11/4/1311/8/13

Fingerprint

Invariance
Fourier transforms
Data-driven
Song
Chroma

ASJC Scopus subject areas

  • Music
  • Information Systems

Cite this

Humphrey, E. J., Nieto, O., & Bello, J. (2013). Data driven and discriminative projections for large-scale cover song identification. In A. D. S. Britto, F. Gouyon, & S. Dixon (Eds.), Proceedings of the 14th International Society for Music Information Retrieval Conference, ISMIR 2013 (pp. 149-154). (Proceedings of the 14th International Society for Music Information Retrieval Conference, ISMIR 2013). International Society for Music Information Retrieval.

Data driven and discriminative projections for large-scale cover song identification. / Humphrey, Eric J.; Nieto, Oriol; Bello, Juan.

Proceedings of the 14th International Society for Music Information Retrieval Conference, ISMIR 2013. ed. / Alceu de Souza Britto; Fabien Gouyon; Simon Dixon. International Society for Music Information Retrieval, 2013. p. 149-154 (Proceedings of the 14th International Society for Music Information Retrieval Conference, ISMIR 2013).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Humphrey, EJ, Nieto, O & Bello, J 2013, Data driven and discriminative projections for large-scale cover song identification. in ADS Britto, F Gouyon & S Dixon (eds), Proceedings of the 14th International Society for Music Information Retrieval Conference, ISMIR 2013. Proceedings of the 14th International Society for Music Information Retrieval Conference, ISMIR 2013, International Society for Music Information Retrieval, pp. 149-154, 14th International Society for Music Information Retrieval Conference, ISMIR 2013, Curitiba, Brazil, 11/4/13.
Humphrey EJ, Nieto O, Bello J. Data driven and discriminative projections for large-scale cover song identification. In Britto ADS, Gouyon F, Dixon S, editors, Proceedings of the 14th International Society for Music Information Retrieval Conference, ISMIR 2013. International Society for Music Information Retrieval. 2013. p. 149-154. (Proceedings of the 14th International Society for Music Information Retrieval Conference, ISMIR 2013).
Humphrey, Eric J. ; Nieto, Oriol ; Bello, Juan. / Data driven and discriminative projections for large-scale cover song identification. Proceedings of the 14th International Society for Music Information Retrieval Conference, ISMIR 2013. editor / Alceu de Souza Britto ; Fabien Gouyon ; Simon Dixon. International Society for Music Information Retrieval, 2013. pp. 149-154 (Proceedings of the 14th International Society for Music Information Retrieval Conference, ISMIR 2013).
@inproceedings{5b0b71460fd649429c1dd2652e5fdc75,
title = "Data driven and discriminative projections for large-scale cover song identification",
abstract = "The predominant approach to computing document similarity in web scale applications proceeds by encoding task-specific invariance in a vectorized representation, such that the relationship between items can be computed efficiently by a simple scoring function, e.g. Euclidean distance. Here, we improve upon previous work in large-scale cover song identification by using data-driven projections at different time-scales to capture local features and embed summary vectors into a semantically organized space. We achieve this by projecting 2D-Fourier Magnitude Coefficients (2D-FMCs) of beat-chroma patches into a sparse, high dimensional representation which, due to the shift invariance properties of the Fourier Transform, is similar in principle to convolutional sparse coding. After aggregating these local beat-chroma projections, we apply supervised dimensionality reduction to recover an embedding where distance is useful for cover song retrieval. Evaluating on the Million Song Dataset, we find our method outperforms the current state of the art overall, but significantly so for top-k metrics, which indicate improved usability.",
author = "Humphrey, {Eric J.} and Oriol Nieto and Juan Bello",
year = "2013",
month = "1",
day = "1",
language = "English (US)",
series = "Proceedings of the 14th International Society for Music Information Retrieval Conference, ISMIR 2013",
publisher = "International Society for Music Information Retrieval",
pages = "149--154",
editor = "Britto, {Alceu de Souza} and Fabien Gouyon and Simon Dixon",
booktitle = "Proceedings of the 14th International Society for Music Information Retrieval Conference, ISMIR 2013",

}

TY - GEN

T1 - Data driven and discriminative projections for large-scale cover song identification

AU - Humphrey, Eric J.

AU - Nieto, Oriol

AU - Bello, Juan

PY - 2013/1/1

Y1 - 2013/1/1

N2 - The predominant approach to computing document similarity in web scale applications proceeds by encoding task-specific invariance in a vectorized representation, such that the relationship between items can be computed efficiently by a simple scoring function, e.g. Euclidean distance. Here, we improve upon previous work in large-scale cover song identification by using data-driven projections at different time-scales to capture local features and embed summary vectors into a semantically organized space. We achieve this by projecting 2D-Fourier Magnitude Coefficients (2D-FMCs) of beat-chroma patches into a sparse, high dimensional representation which, due to the shift invariance properties of the Fourier Transform, is similar in principle to convolutional sparse coding. After aggregating these local beat-chroma projections, we apply supervised dimensionality reduction to recover an embedding where distance is useful for cover song retrieval. Evaluating on the Million Song Dataset, we find our method outperforms the current state of the art overall, but significantly so for top-k metrics, which indicate improved usability.

AB - The predominant approach to computing document similarity in web scale applications proceeds by encoding task-specific invariance in a vectorized representation, such that the relationship between items can be computed efficiently by a simple scoring function, e.g. Euclidean distance. Here, we improve upon previous work in large-scale cover song identification by using data-driven projections at different time-scales to capture local features and embed summary vectors into a semantically organized space. We achieve this by projecting 2D-Fourier Magnitude Coefficients (2D-FMCs) of beat-chroma patches into a sparse, high dimensional representation which, due to the shift invariance properties of the Fourier Transform, is similar in principle to convolutional sparse coding. After aggregating these local beat-chroma projections, we apply supervised dimensionality reduction to recover an embedding where distance is useful for cover song retrieval. Evaluating on the Million Song Dataset, we find our method outperforms the current state of the art overall, but significantly so for top-k metrics, which indicate improved usability.

UR - http://www.scopus.com/inward/record.url?scp=85006053298&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85006053298&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85006053298

T3 - Proceedings of the 14th International Society for Music Information Retrieval Conference, ISMIR 2013

SP - 149

EP - 154

BT - Proceedings of the 14th International Society for Music Information Retrieval Conference, ISMIR 2013

A2 - Britto, Alceu de Souza

A2 - Gouyon, Fabien

A2 - Dixon, Simon

PB - International Society for Music Information Retrieval

ER -