Deep salience representations for F0 estimation in polyphonic music

Rachel M. Bittner, Brian McFee, Justin Salamon, Peter Li, Juan Bello

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Estimating fundamental frequencies in polyphonic music remains a notoriously difficult task in Music Information Retrieval. While other tasks, such as beat tracking and chord recognition have seen improvement with the application of deep learning models, little work has been done to apply deep learning methods to fundamental frequency related tasks including multi-f0 and melody tracking, primarily due to the scarce availability of labeled data. In this work, we describe a fully convolutional neural network for learning salience representations for estimating fundamental frequencies, trained using a large, semi-automatically generated f0 dataset. We demonstrate the effectiveness of our model for learning salience representations for both multi-f0 and melody tracking in polyphonic audio, and show that our models achieve state-of-the-art performance on several multi-f0 and melody datasets. We conclude with directions for future research.

Original languageEnglish (US)
Title of host publicationProceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017
EditorsZhiyao Duan, Douglas Turnbull, Xiao Hu, Sally Jo Cunningham
PublisherInternational Society for Music Information Retrieval
Pages63-70
Number of pages8
ISBN (Electronic)9789811151798
StatePublished - Jan 1 2017
Event18th International Society for Music Information Retrieval Conference, ISMIR 2017 - Suzhou, China
Duration: Oct 23 2017Oct 27 2017

Publication series

NameProceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017

Conference

Conference18th International Society for Music Information Retrieval Conference, ISMIR 2017
CountryChina
CitySuzhou
Period10/23/1710/27/17

Fingerprint

Information retrieval
Availability
Neural networks
Fundamental Frequency
Melody
Music
Polyphonic
Deep learning
Chord
Neural Networks
Music Information Retrieval
Learning Model
Performance Art

ASJC Scopus subject areas

  • Music
  • Information Systems

Cite this

Bittner, R. M., McFee, B., Salamon, J., Li, P., & Bello, J. (2017). Deep salience representations for F0 estimation in polyphonic music. In Z. Duan, D. Turnbull, X. Hu, & S. J. Cunningham (Eds.), Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017 (pp. 63-70). (Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017). International Society for Music Information Retrieval.

Deep salience representations for F0 estimation in polyphonic music. / Bittner, Rachel M.; McFee, Brian; Salamon, Justin; Li, Peter; Bello, Juan.

Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017. ed. / Zhiyao Duan; Douglas Turnbull; Xiao Hu; Sally Jo Cunningham. International Society for Music Information Retrieval, 2017. p. 63-70 (Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bittner, RM, McFee, B, Salamon, J, Li, P & Bello, J 2017, Deep salience representations for F0 estimation in polyphonic music. in Z Duan, D Turnbull, X Hu & SJ Cunningham (eds), Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017. Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017, International Society for Music Information Retrieval, pp. 63-70, 18th International Society for Music Information Retrieval Conference, ISMIR 2017, Suzhou, China, 10/23/17.
Bittner RM, McFee B, Salamon J, Li P, Bello J. Deep salience representations for F0 estimation in polyphonic music. In Duan Z, Turnbull D, Hu X, Cunningham SJ, editors, Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017. International Society for Music Information Retrieval. 2017. p. 63-70. (Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017).
Bittner, Rachel M. ; McFee, Brian ; Salamon, Justin ; Li, Peter ; Bello, Juan. / Deep salience representations for F0 estimation in polyphonic music. Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017. editor / Zhiyao Duan ; Douglas Turnbull ; Xiao Hu ; Sally Jo Cunningham. International Society for Music Information Retrieval, 2017. pp. 63-70 (Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017).
@inproceedings{71f210e17e004348af3169aabe518880,
title = "Deep salience representations for F0 estimation in polyphonic music",
abstract = "Estimating fundamental frequencies in polyphonic music remains a notoriously difficult task in Music Information Retrieval. While other tasks, such as beat tracking and chord recognition have seen improvement with the application of deep learning models, little work has been done to apply deep learning methods to fundamental frequency related tasks including multi-f0 and melody tracking, primarily due to the scarce availability of labeled data. In this work, we describe a fully convolutional neural network for learning salience representations for estimating fundamental frequencies, trained using a large, semi-automatically generated f0 dataset. We demonstrate the effectiveness of our model for learning salience representations for both multi-f0 and melody tracking in polyphonic audio, and show that our models achieve state-of-the-art performance on several multi-f0 and melody datasets. We conclude with directions for future research.",
author = "Bittner, {Rachel M.} and Brian McFee and Justin Salamon and Peter Li and Juan Bello",
year = "2017",
month = "1",
day = "1",
language = "English (US)",
series = "Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017",
publisher = "International Society for Music Information Retrieval",
pages = "63--70",
editor = "Zhiyao Duan and Douglas Turnbull and Xiao Hu and Cunningham, {Sally Jo}",
booktitle = "Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017",

}

TY - GEN

T1 - Deep salience representations for F0 estimation in polyphonic music

AU - Bittner, Rachel M.

AU - McFee, Brian

AU - Salamon, Justin

AU - Li, Peter

AU - Bello, Juan

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Estimating fundamental frequencies in polyphonic music remains a notoriously difficult task in Music Information Retrieval. While other tasks, such as beat tracking and chord recognition have seen improvement with the application of deep learning models, little work has been done to apply deep learning methods to fundamental frequency related tasks including multi-f0 and melody tracking, primarily due to the scarce availability of labeled data. In this work, we describe a fully convolutional neural network for learning salience representations for estimating fundamental frequencies, trained using a large, semi-automatically generated f0 dataset. We demonstrate the effectiveness of our model for learning salience representations for both multi-f0 and melody tracking in polyphonic audio, and show that our models achieve state-of-the-art performance on several multi-f0 and melody datasets. We conclude with directions for future research.

AB - Estimating fundamental frequencies in polyphonic music remains a notoriously difficult task in Music Information Retrieval. While other tasks, such as beat tracking and chord recognition have seen improvement with the application of deep learning models, little work has been done to apply deep learning methods to fundamental frequency related tasks including multi-f0 and melody tracking, primarily due to the scarce availability of labeled data. In this work, we describe a fully convolutional neural network for learning salience representations for estimating fundamental frequencies, trained using a large, semi-automatically generated f0 dataset. We demonstrate the effectiveness of our model for learning salience representations for both multi-f0 and melody tracking in polyphonic audio, and show that our models achieve state-of-the-art performance on several multi-f0 and melody datasets. We conclude with directions for future research.

UR - http://www.scopus.com/inward/record.url?scp=85069924285&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85069924285&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85069924285

T3 - Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017

SP - 63

EP - 70

BT - Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017

A2 - Duan, Zhiyao

A2 - Turnbull, Douglas

A2 - Hu, Xiao

A2 - Cunningham, Sally Jo

PB - International Society for Music Information Retrieval

ER -