Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification

Justin Salamon, Juan Pablo Bello

Research output: Contribution to journalArticle

Abstract

The ability of deep convolutional neural networks (CNNs) to learn discriminative spectro-temporal patterns makes them well suited to environmental sound classification. However, the relative scarcity of labeled data has impeded the exploitation of this family of high-capacity models. This study has two primary contributions: first, we propose a deep CNN architecture for environmental sound classification. Second, we propose the use of audio data augmentation for overcoming the problem of data scarcity and explore the influence of different augmentations on the performance of the proposed CNN architecture. Combined with data augmentation, the proposed model produces state-of-the-art results for environmental sound classification. We show that the improved performance stems from the combination of a deep, high-capacity model and an augmented training set: this combination outperforms both the proposed CNN without augmentation and a 'shallow' dictionary learning model with augmentation. Finally, we examine the influence of each augmentation on the model's classification accuracy for each class, and observe that the accuracy for each class is influenced differently by each augmentation, suggesting that the performance of the model could be improved further by applying class-conditional data augmentation.

Original languageEnglish (US)
Article number7829341
Pages (from-to)279-283
Number of pages5
JournalIEEE Signal Processing Letters
Volume24
Issue number3
DOIs
StatePublished - Mar 1 2017

Fingerprint

Data Augmentation
Augmentation
Acoustic waves
Neural Networks
Neural networks
Network Architecture
Network architecture
Model
Glossaries
Exploitation
Sound
Class

Keywords

  • Deep convolutional neural networks (CNNs)
  • deep learning
  • environmental sound classification
  • urban sound dataset

ASJC Scopus subject areas

  • Signal Processing
  • Applied Mathematics
  • Electrical and Electronic Engineering

Cite this

Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification. / Salamon, Justin; Bello, Juan Pablo.

In: IEEE Signal Processing Letters, Vol. 24, No. 3, 7829341, 01.03.2017, p. 279-283.

Research output: Contribution to journalArticle

@article{1decce9b2a4c4ae99d50319b195bf52d,
title = "Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification",
abstract = "The ability of deep convolutional neural networks (CNNs) to learn discriminative spectro-temporal patterns makes them well suited to environmental sound classification. However, the relative scarcity of labeled data has impeded the exploitation of this family of high-capacity models. This study has two primary contributions: first, we propose a deep CNN architecture for environmental sound classification. Second, we propose the use of audio data augmentation for overcoming the problem of data scarcity and explore the influence of different augmentations on the performance of the proposed CNN architecture. Combined with data augmentation, the proposed model produces state-of-the-art results for environmental sound classification. We show that the improved performance stems from the combination of a deep, high-capacity model and an augmented training set: this combination outperforms both the proposed CNN without augmentation and a 'shallow' dictionary learning model with augmentation. Finally, we examine the influence of each augmentation on the model's classification accuracy for each class, and observe that the accuracy for each class is influenced differently by each augmentation, suggesting that the performance of the model could be improved further by applying class-conditional data augmentation.",
keywords = "Deep convolutional neural networks (CNNs), deep learning, environmental sound classification, urban sound dataset",
author = "Justin Salamon and Bello, {Juan Pablo}",
year = "2017",
month = "3",
day = "1",
doi = "10.1109/LSP.2017.2657381",
language = "English (US)",
volume = "24",
pages = "279--283",
journal = "IEEE Signal Processing Letters",
issn = "1070-9908",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "3",

}

TY - JOUR

T1 - Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification

AU - Salamon, Justin

AU - Bello, Juan Pablo

PY - 2017/3/1

Y1 - 2017/3/1

N2 - The ability of deep convolutional neural networks (CNNs) to learn discriminative spectro-temporal patterns makes them well suited to environmental sound classification. However, the relative scarcity of labeled data has impeded the exploitation of this family of high-capacity models. This study has two primary contributions: first, we propose a deep CNN architecture for environmental sound classification. Second, we propose the use of audio data augmentation for overcoming the problem of data scarcity and explore the influence of different augmentations on the performance of the proposed CNN architecture. Combined with data augmentation, the proposed model produces state-of-the-art results for environmental sound classification. We show that the improved performance stems from the combination of a deep, high-capacity model and an augmented training set: this combination outperforms both the proposed CNN without augmentation and a 'shallow' dictionary learning model with augmentation. Finally, we examine the influence of each augmentation on the model's classification accuracy for each class, and observe that the accuracy for each class is influenced differently by each augmentation, suggesting that the performance of the model could be improved further by applying class-conditional data augmentation.

AB - The ability of deep convolutional neural networks (CNNs) to learn discriminative spectro-temporal patterns makes them well suited to environmental sound classification. However, the relative scarcity of labeled data has impeded the exploitation of this family of high-capacity models. This study has two primary contributions: first, we propose a deep CNN architecture for environmental sound classification. Second, we propose the use of audio data augmentation for overcoming the problem of data scarcity and explore the influence of different augmentations on the performance of the proposed CNN architecture. Combined with data augmentation, the proposed model produces state-of-the-art results for environmental sound classification. We show that the improved performance stems from the combination of a deep, high-capacity model and an augmented training set: this combination outperforms both the proposed CNN without augmentation and a 'shallow' dictionary learning model with augmentation. Finally, we examine the influence of each augmentation on the model's classification accuracy for each class, and observe that the accuracy for each class is influenced differently by each augmentation, suggesting that the performance of the model could be improved further by applying class-conditional data augmentation.

KW - Deep convolutional neural networks (CNNs)

KW - deep learning

KW - environmental sound classification

KW - urban sound dataset

UR - http://www.scopus.com/inward/record.url?scp=85015238568&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85015238568&partnerID=8YFLogxK

U2 - 10.1109/LSP.2017.2657381

DO - 10.1109/LSP.2017.2657381

M3 - Article

VL - 24

SP - 279

EP - 283

JO - IEEE Signal Processing Letters

JF - IEEE Signal Processing Letters

SN - 1070-9908

IS - 3

M1 - 7829341

ER -