Feature adapted convolutional neural networks for downbeat tracking

Simon Durand, Juan P. Bello, Bertrand David, Gael Richard

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We define a novel system for the automatic estimation of downbeat positions from audio music signals. New rhythm and melodic features are introduced and feature adapted convolutional neural networks are used to take advantage of their specificity. Indeed, invariance to melody transposition, chroma data augmentation and length-specific rhythmic patterns prove to be useful to learn downbeat likelihood. After the data is segmented in tatums, complementary features related to melody, rhythm and harmony are extracted and the likelihood of a tatum being at a downbeat position is computed with the aforementioned neural networks. The downbeat sequence is then extracted with a flexible temporal hidden Markov model. We then show the efficiency and robustness of our approach with a comparative evaluation conducted on 9 datasets.

Original languageEnglish (US)
Title of host publication2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages296-300
Number of pages5
Volume2016-May
ISBN (Electronic)9781479999880
DOIs
StatePublished - May 18 2016
Event41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Shanghai, China
Duration: Mar 20 2016Mar 25 2016

Other

Other41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
CountryChina
CityShanghai
Period3/20/163/25/16

Fingerprint

Neural networks
Hidden Markov models
Invariance

Keywords

  • Convolutional Neural Networks
  • Downbeat Tracking
  • Music Information Retrieval
  • Music Signal Processing

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

Durand, S., Bello, J. P., David, B., & Richard, G. (2016). Feature adapted convolutional neural networks for downbeat tracking. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings (Vol. 2016-May, pp. 296-300). [7471684] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2016.7471684

Feature adapted convolutional neural networks for downbeat tracking. / Durand, Simon; Bello, Juan P.; David, Bertrand; Richard, Gael.

2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings. Vol. 2016-May Institute of Electrical and Electronics Engineers Inc., 2016. p. 296-300 7471684.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Durand, S, Bello, JP, David, B & Richard, G 2016, Feature adapted convolutional neural networks for downbeat tracking. in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings. vol. 2016-May, 7471684, Institute of Electrical and Electronics Engineers Inc., pp. 296-300, 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, 3/20/16. https://doi.org/10.1109/ICASSP.2016.7471684
Durand S, Bello JP, David B, Richard G. Feature adapted convolutional neural networks for downbeat tracking. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings. Vol. 2016-May. Institute of Electrical and Electronics Engineers Inc. 2016. p. 296-300. 7471684 https://doi.org/10.1109/ICASSP.2016.7471684
Durand, Simon ; Bello, Juan P. ; David, Bertrand ; Richard, Gael. / Feature adapted convolutional neural networks for downbeat tracking. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings. Vol. 2016-May Institute of Electrical and Electronics Engineers Inc., 2016. pp. 296-300
@inproceedings{2949da133e5043e6b18f61ac6592e928,
title = "Feature adapted convolutional neural networks for downbeat tracking",
abstract = "We define a novel system for the automatic estimation of downbeat positions from audio music signals. New rhythm and melodic features are introduced and feature adapted convolutional neural networks are used to take advantage of their specificity. Indeed, invariance to melody transposition, chroma data augmentation and length-specific rhythmic patterns prove to be useful to learn downbeat likelihood. After the data is segmented in tatums, complementary features related to melody, rhythm and harmony are extracted and the likelihood of a tatum being at a downbeat position is computed with the aforementioned neural networks. The downbeat sequence is then extracted with a flexible temporal hidden Markov model. We then show the efficiency and robustness of our approach with a comparative evaluation conducted on 9 datasets.",
keywords = "Convolutional Neural Networks, Downbeat Tracking, Music Information Retrieval, Music Signal Processing",
author = "Simon Durand and Bello, {Juan P.} and Bertrand David and Gael Richard",
year = "2016",
month = "5",
day = "18",
doi = "10.1109/ICASSP.2016.7471684",
language = "English (US)",
volume = "2016-May",
pages = "296--300",
booktitle = "2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - Feature adapted convolutional neural networks for downbeat tracking

AU - Durand, Simon

AU - Bello, Juan P.

AU - David, Bertrand

AU - Richard, Gael

PY - 2016/5/18

Y1 - 2016/5/18

N2 - We define a novel system for the automatic estimation of downbeat positions from audio music signals. New rhythm and melodic features are introduced and feature adapted convolutional neural networks are used to take advantage of their specificity. Indeed, invariance to melody transposition, chroma data augmentation and length-specific rhythmic patterns prove to be useful to learn downbeat likelihood. After the data is segmented in tatums, complementary features related to melody, rhythm and harmony are extracted and the likelihood of a tatum being at a downbeat position is computed with the aforementioned neural networks. The downbeat sequence is then extracted with a flexible temporal hidden Markov model. We then show the efficiency and robustness of our approach with a comparative evaluation conducted on 9 datasets.

AB - We define a novel system for the automatic estimation of downbeat positions from audio music signals. New rhythm and melodic features are introduced and feature adapted convolutional neural networks are used to take advantage of their specificity. Indeed, invariance to melody transposition, chroma data augmentation and length-specific rhythmic patterns prove to be useful to learn downbeat likelihood. After the data is segmented in tatums, complementary features related to melody, rhythm and harmony are extracted and the likelihood of a tatum being at a downbeat position is computed with the aforementioned neural networks. The downbeat sequence is then extracted with a flexible temporal hidden Markov model. We then show the efficiency and robustness of our approach with a comparative evaluation conducted on 9 datasets.

KW - Convolutional Neural Networks

KW - Downbeat Tracking

KW - Music Information Retrieval

KW - Music Signal Processing

UR - http://www.scopus.com/inward/record.url?scp=84973367430&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84973367430&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2016.7471684

DO - 10.1109/ICASSP.2016.7471684

M3 - Conference contribution

VL - 2016-May

SP - 296

EP - 300

BT - 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -