Fusing shallow and deep learning for bioacoustic bird species classification

Justin Salamon, Juan Bello, Andrew Farnsworth, Steve Kelling

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Automated classification of organisms to species based on their vocalizations would contribute tremendously to abilities to monitor biodiversity, with a wide range of applications in the field of ecology. In particular, automated classification of migrating birds' flight calls could yield new biological insights and conservation applications for birds that vocalize during migration. In this paper we explore state-of-the-art classification techniques for large-vocabulary bird species classification from flight calls. In particular, we contrast a 'shallow learning' approach based on unsupervised dictionary learning with a deep convolutional neural network combined with data augmentation. We show that the two models perform comparably on a dataset of 5428 flight calls spanning 43 different species, with both significantly outperforming an MFCC baseline. Finally, we show that by combining the models using a simple late-fusion approach we can further improve the results, obtaining a state-of-the-art classification accuracy of 0.96.

Original languageEnglish (US)
Title of host publication2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages141-145
Number of pages5
ISBN (Electronic)9781509041176
DOIs
StatePublished - Jun 16 2017
Event2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - New Orleans, United States
Duration: Mar 5 2017Mar 9 2017

Other

Other2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017
CountryUnited States
CityNew Orleans
Period3/5/173/9/17

Fingerprint

Bioacoustics
Birds
Biodiversity
Ecology
Glossaries
Conservation
Fusion reactions
Deep learning
Neural networks

Keywords

  • bioacoustics
  • Convolutional neural networks
  • data augmentation
  • deep learning
  • flight calls

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

Salamon, J., Bello, J., Farnsworth, A., & Kelling, S. (2017). Fusing shallow and deep learning for bioacoustic bird species classification. In 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings (pp. 141-145). [7952134] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2017.7952134

Fusing shallow and deep learning for bioacoustic bird species classification. / Salamon, Justin; Bello, Juan; Farnsworth, Andrew; Kelling, Steve.

2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2017. p. 141-145 7952134.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Salamon, J, Bello, J, Farnsworth, A & Kelling, S 2017, Fusing shallow and deep learning for bioacoustic bird species classification. in 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings., 7952134, Institute of Electrical and Electronics Engineers Inc., pp. 141-145, 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017, New Orleans, United States, 3/5/17. https://doi.org/10.1109/ICASSP.2017.7952134
Salamon J, Bello J, Farnsworth A, Kelling S. Fusing shallow and deep learning for bioacoustic bird species classification. In 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2017. p. 141-145. 7952134 https://doi.org/10.1109/ICASSP.2017.7952134
Salamon, Justin ; Bello, Juan ; Farnsworth, Andrew ; Kelling, Steve. / Fusing shallow and deep learning for bioacoustic bird species classification. 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2017. pp. 141-145
@inproceedings{df7215d800da460892d9998ddca4068d,
title = "Fusing shallow and deep learning for bioacoustic bird species classification",
abstract = "Automated classification of organisms to species based on their vocalizations would contribute tremendously to abilities to monitor biodiversity, with a wide range of applications in the field of ecology. In particular, automated classification of migrating birds' flight calls could yield new biological insights and conservation applications for birds that vocalize during migration. In this paper we explore state-of-the-art classification techniques for large-vocabulary bird species classification from flight calls. In particular, we contrast a 'shallow learning' approach based on unsupervised dictionary learning with a deep convolutional neural network combined with data augmentation. We show that the two models perform comparably on a dataset of 5428 flight calls spanning 43 different species, with both significantly outperforming an MFCC baseline. Finally, we show that by combining the models using a simple late-fusion approach we can further improve the results, obtaining a state-of-the-art classification accuracy of 0.96.",
keywords = "bioacoustics, Convolutional neural networks, data augmentation, deep learning, flight calls",
author = "Justin Salamon and Juan Bello and Andrew Farnsworth and Steve Kelling",
year = "2017",
month = "6",
day = "16",
doi = "10.1109/ICASSP.2017.7952134",
language = "English (US)",
pages = "141--145",
booktitle = "2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - Fusing shallow and deep learning for bioacoustic bird species classification

AU - Salamon, Justin

AU - Bello, Juan

AU - Farnsworth, Andrew

AU - Kelling, Steve

PY - 2017/6/16

Y1 - 2017/6/16

N2 - Automated classification of organisms to species based on their vocalizations would contribute tremendously to abilities to monitor biodiversity, with a wide range of applications in the field of ecology. In particular, automated classification of migrating birds' flight calls could yield new biological insights and conservation applications for birds that vocalize during migration. In this paper we explore state-of-the-art classification techniques for large-vocabulary bird species classification from flight calls. In particular, we contrast a 'shallow learning' approach based on unsupervised dictionary learning with a deep convolutional neural network combined with data augmentation. We show that the two models perform comparably on a dataset of 5428 flight calls spanning 43 different species, with both significantly outperforming an MFCC baseline. Finally, we show that by combining the models using a simple late-fusion approach we can further improve the results, obtaining a state-of-the-art classification accuracy of 0.96.

AB - Automated classification of organisms to species based on their vocalizations would contribute tremendously to abilities to monitor biodiversity, with a wide range of applications in the field of ecology. In particular, automated classification of migrating birds' flight calls could yield new biological insights and conservation applications for birds that vocalize during migration. In this paper we explore state-of-the-art classification techniques for large-vocabulary bird species classification from flight calls. In particular, we contrast a 'shallow learning' approach based on unsupervised dictionary learning with a deep convolutional neural network combined with data augmentation. We show that the two models perform comparably on a dataset of 5428 flight calls spanning 43 different species, with both significantly outperforming an MFCC baseline. Finally, we show that by combining the models using a simple late-fusion approach we can further improve the results, obtaining a state-of-the-art classification accuracy of 0.96.

KW - bioacoustics

KW - Convolutional neural networks

KW - data augmentation

KW - deep learning

KW - flight calls

UR - http://www.scopus.com/inward/record.url?scp=85023772547&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85023772547&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2017.7952134

DO - 10.1109/ICASSP.2017.7952134

M3 - Conference contribution

AN - SCOPUS:85023772547

SP - 141

EP - 145

BT - 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -