Open-Source Practices for Music Signal Processing Research: Recommendations for Transparent, Sustainable, and Reproducible Audio Research

Brian McFee, Jong Wook Kim, Mark Cartwright, Justin Salamon, Rachel M. Bittner, Juan Bello

Research output: Contribution to journalArticle

Abstract

In the early years of music information retrieval (MIR), research problems were often centered around conceptually simple tasks, and methods were evaluated on small, idealized data sets. A canonical example of this is genre recognition-i.e., Which one of n genres describes this song?-which was often evaluated on the GTZAN data set (1,000 musical excerpts balanced across ten genres) [1]. As task definitions were simple, so too were signal analysis pipelines, which often derived from methods for speech processing and recognition and typically consisted of simple methods for feature extraction, statistical modeling, and evaluation. When describing a research system, the expected level of detail was superficial: it was sufficient to state, e.g., the number of mel-frequency cepstral coefficients used, the statistical model (e.g., a Gaussian mixture model), the choice of data set, and the evaluation criteria, without stating the underlying software dependencies or implementation details. Because of an increased abundance of methods, the proliferation of software toolkits, the explosion of machine learning, and a focus shift toward more realistic problem settings, modern research systems are substantially more complex than their predecessors. Modern MIR researchers must pay careful attention to detail when processing metadata, implementing evaluation criteria, and disseminating results.

Original languageEnglish (US)
Article number8588406
Pages (from-to)128-137
Number of pages10
JournalIEEE Signal Processing Magazine
Volume36
Issue number1
DOIs
StatePublished - Jan 1 2019

Fingerprint

Music
Open Source
Signal Processing
Recommendations
Signal processing
Music Information Retrieval
Information retrieval
Speech processing
Evaluation
Signal analysis
Metadata
Speech recognition
Speech Processing
Explosions
Software
Signal Analysis
Learning systems
Feature extraction
Statistical Modeling
Gaussian Mixture Model

ASJC Scopus subject areas

  • Signal Processing
  • Electrical and Electronic Engineering
  • Applied Mathematics

Cite this

Open-Source Practices for Music Signal Processing Research : Recommendations for Transparent, Sustainable, and Reproducible Audio Research. / McFee, Brian; Kim, Jong Wook; Cartwright, Mark; Salamon, Justin; Bittner, Rachel M.; Bello, Juan.

In: IEEE Signal Processing Magazine, Vol. 36, No. 1, 8588406, 01.01.2019, p. 128-137.

Research output: Contribution to journalArticle

McFee, Brian ; Kim, Jong Wook ; Cartwright, Mark ; Salamon, Justin ; Bittner, Rachel M. ; Bello, Juan. / Open-Source Practices for Music Signal Processing Research : Recommendations for Transparent, Sustainable, and Reproducible Audio Research. In: IEEE Signal Processing Magazine. 2019 ; Vol. 36, No. 1. pp. 128-137.
@article{05932f40158440999bd7434522d25b24,
title = "Open-Source Practices for Music Signal Processing Research: Recommendations for Transparent, Sustainable, and Reproducible Audio Research",
abstract = "In the early years of music information retrieval (MIR), research problems were often centered around conceptually simple tasks, and methods were evaluated on small, idealized data sets. A canonical example of this is genre recognition-i.e., Which one of n genres describes this song?-which was often evaluated on the GTZAN data set (1,000 musical excerpts balanced across ten genres) [1]. As task definitions were simple, so too were signal analysis pipelines, which often derived from methods for speech processing and recognition and typically consisted of simple methods for feature extraction, statistical modeling, and evaluation. When describing a research system, the expected level of detail was superficial: it was sufficient to state, e.g., the number of mel-frequency cepstral coefficients used, the statistical model (e.g., a Gaussian mixture model), the choice of data set, and the evaluation criteria, without stating the underlying software dependencies or implementation details. Because of an increased abundance of methods, the proliferation of software toolkits, the explosion of machine learning, and a focus shift toward more realistic problem settings, modern research systems are substantially more complex than their predecessors. Modern MIR researchers must pay careful attention to detail when processing metadata, implementing evaluation criteria, and disseminating results.",
author = "Brian McFee and Kim, {Jong Wook} and Mark Cartwright and Justin Salamon and Bittner, {Rachel M.} and Juan Bello",
year = "2019",
month = "1",
day = "1",
doi = "10.1109/MSP.2018.2875349",
language = "English (US)",
volume = "36",
pages = "128--137",
journal = "IEEE Signal Processing Magazine",
issn = "1053-5888",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "1",

}

TY - JOUR

T1 - Open-Source Practices for Music Signal Processing Research

T2 - Recommendations for Transparent, Sustainable, and Reproducible Audio Research

AU - McFee, Brian

AU - Kim, Jong Wook

AU - Cartwright, Mark

AU - Salamon, Justin

AU - Bittner, Rachel M.

AU - Bello, Juan

PY - 2019/1/1

Y1 - 2019/1/1

N2 - In the early years of music information retrieval (MIR), research problems were often centered around conceptually simple tasks, and methods were evaluated on small, idealized data sets. A canonical example of this is genre recognition-i.e., Which one of n genres describes this song?-which was often evaluated on the GTZAN data set (1,000 musical excerpts balanced across ten genres) [1]. As task definitions were simple, so too were signal analysis pipelines, which often derived from methods for speech processing and recognition and typically consisted of simple methods for feature extraction, statistical modeling, and evaluation. When describing a research system, the expected level of detail was superficial: it was sufficient to state, e.g., the number of mel-frequency cepstral coefficients used, the statistical model (e.g., a Gaussian mixture model), the choice of data set, and the evaluation criteria, without stating the underlying software dependencies or implementation details. Because of an increased abundance of methods, the proliferation of software toolkits, the explosion of machine learning, and a focus shift toward more realistic problem settings, modern research systems are substantially more complex than their predecessors. Modern MIR researchers must pay careful attention to detail when processing metadata, implementing evaluation criteria, and disseminating results.

AB - In the early years of music information retrieval (MIR), research problems were often centered around conceptually simple tasks, and methods were evaluated on small, idealized data sets. A canonical example of this is genre recognition-i.e., Which one of n genres describes this song?-which was often evaluated on the GTZAN data set (1,000 musical excerpts balanced across ten genres) [1]. As task definitions were simple, so too were signal analysis pipelines, which often derived from methods for speech processing and recognition and typically consisted of simple methods for feature extraction, statistical modeling, and evaluation. When describing a research system, the expected level of detail was superficial: it was sufficient to state, e.g., the number of mel-frequency cepstral coefficients used, the statistical model (e.g., a Gaussian mixture model), the choice of data set, and the evaluation criteria, without stating the underlying software dependencies or implementation details. Because of an increased abundance of methods, the proliferation of software toolkits, the explosion of machine learning, and a focus shift toward more realistic problem settings, modern research systems are substantially more complex than their predecessors. Modern MIR researchers must pay careful attention to detail when processing metadata, implementing evaluation criteria, and disseminating results.

UR - http://www.scopus.com/inward/record.url?scp=85059779386&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85059779386&partnerID=8YFLogxK

U2 - 10.1109/MSP.2018.2875349

DO - 10.1109/MSP.2018.2875349

M3 - Article

AN - SCOPUS:85059779386

VL - 36

SP - 128

EP - 137

JO - IEEE Signal Processing Magazine

JF - IEEE Signal Processing Magazine

SN - 1053-5888

IS - 1

M1 - 8588406

ER -