An analysis/synthesis framework for automatic f0 annotation of multitrack datasets

Justin Salamon, Rachel M. Bittner, Jordi Bonada, Juan J. Bosch, Emilia Gómez, Juan Bello

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Generating continuous f0 annotations for tasks such as melody extraction and multiple f0 estimation typically involves running a monophonic pitch tracker on each track of a multitrack recording and manually correcting any estimation errors. This process is labor intensive and time consuming, and consequently existing annotated datasets are very limited in size. In this paper we propose a framework for automatically generating continuous f0 annotations without requiring manual refinement: the estimate of a pitch tracker is used to drive an analysis/synthesis pipeline which produces a synthesized version of the track. Any estimation errors are now reflected in the synthesized audio, meaning the tracker's output represents an accurate annotation. Analysis is performed using a wide-band harmonic sinusoidal modeling algorithm which estimates the frequency, amplitude and phase of every harmonic, meaning the synthesized track closely resembles the original in terms of timbre and dynamics. Finally the synthesized track is automatically mixed back into the multitrack. The framework can be used to annotate multitrack datasets for training learning-based algorithms. Furthermore, we show that algorithms evaluated on the automatically generated/annotated mixes produce results that are statistically indistinguishable from those they produce on the original, manually annotated, mixes. We release a software library implementing the proposed framework, along with new datasets for melody, bass and multiple f0 estimation.

Original languageEnglish (US)
Title of host publicationProceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017
EditorsZhiyao Duan, Douglas Turnbull, Xiao Hu, Sally Jo Cunningham
PublisherInternational Society for Music Information Retrieval
Pages71-78
Number of pages8
ISBN (Electronic)9789811151798
StatePublished - Jan 1 2017
Event18th International Society for Music Information Retrieval Conference, ISMIR 2017 - Suzhou, China
Duration: Oct 23 2017Oct 27 2017

Publication series

NameProceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017

Conference

Conference18th International Society for Music Information Retrieval Conference, ISMIR 2017
CountryChina
CitySuzhou
Period10/23/1710/27/17

Fingerprint

Error analysis
Pipelines
Personnel
Annotation
Harmonics
Melody
Modeling
Timbre
Monophonic
Labor
Software

ASJC Scopus subject areas

  • Music
  • Information Systems

Cite this

Salamon, J., Bittner, R. M., Bonada, J., Bosch, J. J., Gómez, E., & Bello, J. (2017). An analysis/synthesis framework for automatic f0 annotation of multitrack datasets. In Z. Duan, D. Turnbull, X. Hu, & S. J. Cunningham (Eds.), Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017 (pp. 71-78). (Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017). International Society for Music Information Retrieval.

An analysis/synthesis framework for automatic f0 annotation of multitrack datasets. / Salamon, Justin; Bittner, Rachel M.; Bonada, Jordi; Bosch, Juan J.; Gómez, Emilia; Bello, Juan.

Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017. ed. / Zhiyao Duan; Douglas Turnbull; Xiao Hu; Sally Jo Cunningham. International Society for Music Information Retrieval, 2017. p. 71-78 (Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Salamon, J, Bittner, RM, Bonada, J, Bosch, JJ, Gómez, E & Bello, J 2017, An analysis/synthesis framework for automatic f0 annotation of multitrack datasets. in Z Duan, D Turnbull, X Hu & SJ Cunningham (eds), Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017. Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017, International Society for Music Information Retrieval, pp. 71-78, 18th International Society for Music Information Retrieval Conference, ISMIR 2017, Suzhou, China, 10/23/17.
Salamon J, Bittner RM, Bonada J, Bosch JJ, Gómez E, Bello J. An analysis/synthesis framework for automatic f0 annotation of multitrack datasets. In Duan Z, Turnbull D, Hu X, Cunningham SJ, editors, Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017. International Society for Music Information Retrieval. 2017. p. 71-78. (Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017).
Salamon, Justin ; Bittner, Rachel M. ; Bonada, Jordi ; Bosch, Juan J. ; Gómez, Emilia ; Bello, Juan. / An analysis/synthesis framework for automatic f0 annotation of multitrack datasets. Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017. editor / Zhiyao Duan ; Douglas Turnbull ; Xiao Hu ; Sally Jo Cunningham. International Society for Music Information Retrieval, 2017. pp. 71-78 (Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017).
@inproceedings{b21e4732bc5344dcb87dba4b70410233,
title = "An analysis/synthesis framework for automatic f0 annotation of multitrack datasets",
abstract = "Generating continuous f0 annotations for tasks such as melody extraction and multiple f0 estimation typically involves running a monophonic pitch tracker on each track of a multitrack recording and manually correcting any estimation errors. This process is labor intensive and time consuming, and consequently existing annotated datasets are very limited in size. In this paper we propose a framework for automatically generating continuous f0 annotations without requiring manual refinement: the estimate of a pitch tracker is used to drive an analysis/synthesis pipeline which produces a synthesized version of the track. Any estimation errors are now reflected in the synthesized audio, meaning the tracker's output represents an accurate annotation. Analysis is performed using a wide-band harmonic sinusoidal modeling algorithm which estimates the frequency, amplitude and phase of every harmonic, meaning the synthesized track closely resembles the original in terms of timbre and dynamics. Finally the synthesized track is automatically mixed back into the multitrack. The framework can be used to annotate multitrack datasets for training learning-based algorithms. Furthermore, we show that algorithms evaluated on the automatically generated/annotated mixes produce results that are statistically indistinguishable from those they produce on the original, manually annotated, mixes. We release a software library implementing the proposed framework, along with new datasets for melody, bass and multiple f0 estimation.",
author = "Justin Salamon and Bittner, {Rachel M.} and Jordi Bonada and Bosch, {Juan J.} and Emilia G{\'o}mez and Juan Bello",
year = "2017",
month = "1",
day = "1",
language = "English (US)",
series = "Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017",
publisher = "International Society for Music Information Retrieval",
pages = "71--78",
editor = "Zhiyao Duan and Douglas Turnbull and Xiao Hu and Cunningham, {Sally Jo}",
booktitle = "Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017",

}

TY - GEN

T1 - An analysis/synthesis framework for automatic f0 annotation of multitrack datasets

AU - Salamon, Justin

AU - Bittner, Rachel M.

AU - Bonada, Jordi

AU - Bosch, Juan J.

AU - Gómez, Emilia

AU - Bello, Juan

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Generating continuous f0 annotations for tasks such as melody extraction and multiple f0 estimation typically involves running a monophonic pitch tracker on each track of a multitrack recording and manually correcting any estimation errors. This process is labor intensive and time consuming, and consequently existing annotated datasets are very limited in size. In this paper we propose a framework for automatically generating continuous f0 annotations without requiring manual refinement: the estimate of a pitch tracker is used to drive an analysis/synthesis pipeline which produces a synthesized version of the track. Any estimation errors are now reflected in the synthesized audio, meaning the tracker's output represents an accurate annotation. Analysis is performed using a wide-band harmonic sinusoidal modeling algorithm which estimates the frequency, amplitude and phase of every harmonic, meaning the synthesized track closely resembles the original in terms of timbre and dynamics. Finally the synthesized track is automatically mixed back into the multitrack. The framework can be used to annotate multitrack datasets for training learning-based algorithms. Furthermore, we show that algorithms evaluated on the automatically generated/annotated mixes produce results that are statistically indistinguishable from those they produce on the original, manually annotated, mixes. We release a software library implementing the proposed framework, along with new datasets for melody, bass and multiple f0 estimation.

AB - Generating continuous f0 annotations for tasks such as melody extraction and multiple f0 estimation typically involves running a monophonic pitch tracker on each track of a multitrack recording and manually correcting any estimation errors. This process is labor intensive and time consuming, and consequently existing annotated datasets are very limited in size. In this paper we propose a framework for automatically generating continuous f0 annotations without requiring manual refinement: the estimate of a pitch tracker is used to drive an analysis/synthesis pipeline which produces a synthesized version of the track. Any estimation errors are now reflected in the synthesized audio, meaning the tracker's output represents an accurate annotation. Analysis is performed using a wide-band harmonic sinusoidal modeling algorithm which estimates the frequency, amplitude and phase of every harmonic, meaning the synthesized track closely resembles the original in terms of timbre and dynamics. Finally the synthesized track is automatically mixed back into the multitrack. The framework can be used to annotate multitrack datasets for training learning-based algorithms. Furthermore, we show that algorithms evaluated on the automatically generated/annotated mixes produce results that are statistically indistinguishable from those they produce on the original, manually annotated, mixes. We release a software library implementing the proposed framework, along with new datasets for melody, bass and multiple f0 estimation.

UR - http://www.scopus.com/inward/record.url?scp=85054290573&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054290573&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85054290573

T3 - Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017

SP - 71

EP - 78

BT - Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017

A2 - Duan, Zhiyao

A2 - Turnbull, Douglas

A2 - Hu, Xiao

A2 - Cunningham, Sally Jo

PB - International Society for Music Information Retrieval

ER -