Analysis of common design choices in deep learning systems for downbeat tracking

Magdalena Fuentes, Brian McFee, Hélène C. Crayencour, Slim Essid, Juan Bello

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Downbeat tracking consists of annotating a piece of musical audio with the estimated position of the first beat of each bar. In recent years, increasing attention has been paid to applying deep learning models to this task, and various architectures have been proposed, leading to a significant improvement in accuracy. However, there are few insights about the role of the various design choices and the delicate interactions between them. In this paper we offer a systematic investigation of the impact of largely adopted variants. We study the effects of the temporal granularity of the input representation (i.e. beat-level vs tatum-level) and the encoding of the networks outputs. We also investigate the potential of convolutional-recurrent networks, which have not been explored in previous downbeat tracking systems. To this end, we exploit a state-of-the-art recurrent neural network where we introduce those variants, while keeping the training data, network learning parameters and post-processing stages fixed. We find that temporal granularity has a significant impact on performance, and we analyze its interaction with the encoding of the networks outputs.

Original languageEnglish (US)
Title of host publicationProceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018
EditorsEmmanouil Benetos, Emilia Gomez, Xiao Hu, Eric Humphrey
PublisherInternational Society for Music Information Retrieval
Pages106-112
Number of pages7
ISBN (Electronic)9782954035123
StatePublished - Jan 1 2018
Event19th International Society for Music Information Retrieval Conference, ISMIR 2018 - Paris, France
Duration: Sep 23 2018Sep 27 2018

Publication series

NameProceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018

Conference

Conference19th International Society for Music Information Retrieval Conference, ISMIR 2018
CountryFrance
CityParis
Period9/23/189/27/18

Fingerprint

Recurrent neural networks
Learning systems
Processing
Deep learning
Learning Systems
Downbeat
Granularity
Encoding
Interaction

ASJC Scopus subject areas

  • Music
  • Information Systems

Cite this

Fuentes, M., McFee, B., Crayencour, H. C., Essid, S., & Bello, J. (2018). Analysis of common design choices in deep learning systems for downbeat tracking. In E. Benetos, E. Gomez, X. Hu, & E. Humphrey (Eds.), Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018 (pp. 106-112). (Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018). International Society for Music Information Retrieval.

Analysis of common design choices in deep learning systems for downbeat tracking. / Fuentes, Magdalena; McFee, Brian; Crayencour, Hélène C.; Essid, Slim; Bello, Juan.

Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018. ed. / Emmanouil Benetos; Emilia Gomez; Xiao Hu; Eric Humphrey. International Society for Music Information Retrieval, 2018. p. 106-112 (Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Fuentes, M, McFee, B, Crayencour, HC, Essid, S & Bello, J 2018, Analysis of common design choices in deep learning systems for downbeat tracking. in E Benetos, E Gomez, X Hu & E Humphrey (eds), Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018. Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, International Society for Music Information Retrieval, pp. 106-112, 19th International Society for Music Information Retrieval Conference, ISMIR 2018, Paris, France, 9/23/18.
Fuentes M, McFee B, Crayencour HC, Essid S, Bello J. Analysis of common design choices in deep learning systems for downbeat tracking. In Benetos E, Gomez E, Hu X, Humphrey E, editors, Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018. International Society for Music Information Retrieval. 2018. p. 106-112. (Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018).
Fuentes, Magdalena ; McFee, Brian ; Crayencour, Hélène C. ; Essid, Slim ; Bello, Juan. / Analysis of common design choices in deep learning systems for downbeat tracking. Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018. editor / Emmanouil Benetos ; Emilia Gomez ; Xiao Hu ; Eric Humphrey. International Society for Music Information Retrieval, 2018. pp. 106-112 (Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018).
@inproceedings{1bc97fb40aae40ee8bdf43fc52082f20,
title = "Analysis of common design choices in deep learning systems for downbeat tracking",
abstract = "Downbeat tracking consists of annotating a piece of musical audio with the estimated position of the first beat of each bar. In recent years, increasing attention has been paid to applying deep learning models to this task, and various architectures have been proposed, leading to a significant improvement in accuracy. However, there are few insights about the role of the various design choices and the delicate interactions between them. In this paper we offer a systematic investigation of the impact of largely adopted variants. We study the effects of the temporal granularity of the input representation (i.e. beat-level vs tatum-level) and the encoding of the networks outputs. We also investigate the potential of convolutional-recurrent networks, which have not been explored in previous downbeat tracking systems. To this end, we exploit a state-of-the-art recurrent neural network where we introduce those variants, while keeping the training data, network learning parameters and post-processing stages fixed. We find that temporal granularity has a significant impact on performance, and we analyze its interaction with the encoding of the networks outputs.",
author = "Magdalena Fuentes and Brian McFee and Crayencour, {H{\'e}l{\`e}ne C.} and Slim Essid and Juan Bello",
year = "2018",
month = "1",
day = "1",
language = "English (US)",
series = "Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018",
publisher = "International Society for Music Information Retrieval",
pages = "106--112",
editor = "Emmanouil Benetos and Emilia Gomez and Xiao Hu and Eric Humphrey",
booktitle = "Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018",

}

TY - GEN

T1 - Analysis of common design choices in deep learning systems for downbeat tracking

AU - Fuentes, Magdalena

AU - McFee, Brian

AU - Crayencour, Hélène C.

AU - Essid, Slim

AU - Bello, Juan

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Downbeat tracking consists of annotating a piece of musical audio with the estimated position of the first beat of each bar. In recent years, increasing attention has been paid to applying deep learning models to this task, and various architectures have been proposed, leading to a significant improvement in accuracy. However, there are few insights about the role of the various design choices and the delicate interactions between them. In this paper we offer a systematic investigation of the impact of largely adopted variants. We study the effects of the temporal granularity of the input representation (i.e. beat-level vs tatum-level) and the encoding of the networks outputs. We also investigate the potential of convolutional-recurrent networks, which have not been explored in previous downbeat tracking systems. To this end, we exploit a state-of-the-art recurrent neural network where we introduce those variants, while keeping the training data, network learning parameters and post-processing stages fixed. We find that temporal granularity has a significant impact on performance, and we analyze its interaction with the encoding of the networks outputs.

AB - Downbeat tracking consists of annotating a piece of musical audio with the estimated position of the first beat of each bar. In recent years, increasing attention has been paid to applying deep learning models to this task, and various architectures have been proposed, leading to a significant improvement in accuracy. However, there are few insights about the role of the various design choices and the delicate interactions between them. In this paper we offer a systematic investigation of the impact of largely adopted variants. We study the effects of the temporal granularity of the input representation (i.e. beat-level vs tatum-level) and the encoding of the networks outputs. We also investigate the potential of convolutional-recurrent networks, which have not been explored in previous downbeat tracking systems. To this end, we exploit a state-of-the-art recurrent neural network where we introduce those variants, while keeping the training data, network learning parameters and post-processing stages fixed. We find that temporal granularity has a significant impact on performance, and we analyze its interaction with the encoding of the networks outputs.

UR - http://www.scopus.com/inward/record.url?scp=85065981409&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85065981409&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85065981409

T3 - Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018

SP - 106

EP - 112

BT - Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018

A2 - Benetos, Emmanouil

A2 - Gomez, Emilia

A2 - Hu, Xiao

A2 - Humphrey, Eric

PB - International Society for Music Information Retrieval

ER -