Per-Channel Energy Normalization

Why and How

Vincent Lostanlen, Justin Salamon, Mark Cartwright, Brian McFee, Andrew Farnsworth, Steve Kelling, Juan Bello

Research output: Contribution to journalArticle

Abstract

In the context of automatic speech recognition and acoustic event detection, an adaptive procedure named per-channel energy normalization (PCEN) has recently shown to outperform the pointwise logarithm of mel-frequency spectrogram (logmelspec) as an acoustic frontend. This article investigates the adequacy of PCEN for spectrogram-based pattern recognition in far-field noisy recordings, both from theoretical and practical standpoints. First, we apply PCEN on various datasets of natural acoustic environments and find empirically that it Gaussianizes distributions of magnitudes while decorrelating frequency bands. Secondly, we describe the asymptotic regimes of each component in PCEN: temporal integration, gain control, and dynamic range compression. Thirdly, we give practical advice for adapting PCEN parameters to the temporal properties of the noise to be mitigated, the signal to be enhanced, and the choice of time-frequency representation. As it converts a large class of real-world soundscapes into additive white Gaussian noise (AWGN), PCEN is a computationally efficient frontend for robust detection and classification of acoustic events in heterogeneous environments.

Original languageEnglish (US)
JournalIEEE Signal Processing Letters
DOIs
StateAccepted/In press - Jan 1 2018

Fingerprint

Normalization
Acoustics
Energy
Spectrogram
Gain control
Speech recognition
Frequency bands
Pattern recognition
Adaptive Procedure
Event Detection
Automatic Speech Recognition
Heterogeneous Environment
Gaussian White Noise
Dynamic Range
Far Field
Logarithm
Pattern Recognition
Convert
Compression

Keywords

  • Acoustic noise
  • acoustic sensors
  • acoustic signal detection
  • signal classification
  • spectrogram

ASJC Scopus subject areas

  • Signal Processing
  • Electrical and Electronic Engineering
  • Applied Mathematics

Cite this

Lostanlen, V., Salamon, J., Cartwright, M., McFee, B., Farnsworth, A., Kelling, S., & Bello, J. (Accepted/In press). Per-Channel Energy Normalization: Why and How. IEEE Signal Processing Letters. https://doi.org/10.1109/LSP.2018.2878620

Per-Channel Energy Normalization : Why and How. / Lostanlen, Vincent; Salamon, Justin; Cartwright, Mark; McFee, Brian; Farnsworth, Andrew; Kelling, Steve; Bello, Juan.

In: IEEE Signal Processing Letters, 01.01.2018.

Research output: Contribution to journalArticle

Lostanlen, V, Salamon, J, Cartwright, M, McFee, B, Farnsworth, A, Kelling, S & Bello, J 2018, 'Per-Channel Energy Normalization: Why and How', IEEE Signal Processing Letters. https://doi.org/10.1109/LSP.2018.2878620
Lostanlen V, Salamon J, Cartwright M, McFee B, Farnsworth A, Kelling S et al. Per-Channel Energy Normalization: Why and How. IEEE Signal Processing Letters. 2018 Jan 1. https://doi.org/10.1109/LSP.2018.2878620
Lostanlen, Vincent ; Salamon, Justin ; Cartwright, Mark ; McFee, Brian ; Farnsworth, Andrew ; Kelling, Steve ; Bello, Juan. / Per-Channel Energy Normalization : Why and How. In: IEEE Signal Processing Letters. 2018.
@article{f77c5e6213eb49f7b69dad8045017f4f,
title = "Per-Channel Energy Normalization: Why and How",
abstract = "In the context of automatic speech recognition and acoustic event detection, an adaptive procedure named per-channel energy normalization (PCEN) has recently shown to outperform the pointwise logarithm of mel-frequency spectrogram (logmelspec) as an acoustic frontend. This article investigates the adequacy of PCEN for spectrogram-based pattern recognition in far-field noisy recordings, both from theoretical and practical standpoints. First, we apply PCEN on various datasets of natural acoustic environments and find empirically that it Gaussianizes distributions of magnitudes while decorrelating frequency bands. Secondly, we describe the asymptotic regimes of each component in PCEN: temporal integration, gain control, and dynamic range compression. Thirdly, we give practical advice for adapting PCEN parameters to the temporal properties of the noise to be mitigated, the signal to be enhanced, and the choice of time-frequency representation. As it converts a large class of real-world soundscapes into additive white Gaussian noise (AWGN), PCEN is a computationally efficient frontend for robust detection and classification of acoustic events in heterogeneous environments.",
keywords = "Acoustic noise, acoustic sensors, acoustic signal detection, signal classification, spectrogram",
author = "Vincent Lostanlen and Justin Salamon and Mark Cartwright and Brian McFee and Andrew Farnsworth and Steve Kelling and Juan Bello",
year = "2018",
month = "1",
day = "1",
doi = "10.1109/LSP.2018.2878620",
language = "English (US)",
journal = "IEEE Signal Processing Letters",
issn = "1070-9908",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Per-Channel Energy Normalization

T2 - Why and How

AU - Lostanlen, Vincent

AU - Salamon, Justin

AU - Cartwright, Mark

AU - McFee, Brian

AU - Farnsworth, Andrew

AU - Kelling, Steve

AU - Bello, Juan

PY - 2018/1/1

Y1 - 2018/1/1

N2 - In the context of automatic speech recognition and acoustic event detection, an adaptive procedure named per-channel energy normalization (PCEN) has recently shown to outperform the pointwise logarithm of mel-frequency spectrogram (logmelspec) as an acoustic frontend. This article investigates the adequacy of PCEN for spectrogram-based pattern recognition in far-field noisy recordings, both from theoretical and practical standpoints. First, we apply PCEN on various datasets of natural acoustic environments and find empirically that it Gaussianizes distributions of magnitudes while decorrelating frequency bands. Secondly, we describe the asymptotic regimes of each component in PCEN: temporal integration, gain control, and dynamic range compression. Thirdly, we give practical advice for adapting PCEN parameters to the temporal properties of the noise to be mitigated, the signal to be enhanced, and the choice of time-frequency representation. As it converts a large class of real-world soundscapes into additive white Gaussian noise (AWGN), PCEN is a computationally efficient frontend for robust detection and classification of acoustic events in heterogeneous environments.

AB - In the context of automatic speech recognition and acoustic event detection, an adaptive procedure named per-channel energy normalization (PCEN) has recently shown to outperform the pointwise logarithm of mel-frequency spectrogram (logmelspec) as an acoustic frontend. This article investigates the adequacy of PCEN for spectrogram-based pattern recognition in far-field noisy recordings, both from theoretical and practical standpoints. First, we apply PCEN on various datasets of natural acoustic environments and find empirically that it Gaussianizes distributions of magnitudes while decorrelating frequency bands. Secondly, we describe the asymptotic regimes of each component in PCEN: temporal integration, gain control, and dynamic range compression. Thirdly, we give practical advice for adapting PCEN parameters to the temporal properties of the noise to be mitigated, the signal to be enhanced, and the choice of time-frequency representation. As it converts a large class of real-world soundscapes into additive white Gaussian noise (AWGN), PCEN is a computationally efficient frontend for robust detection and classification of acoustic events in heterogeneous environments.

KW - Acoustic noise

KW - acoustic sensors

KW - acoustic signal detection

KW - signal classification

KW - spectrogram

UR - http://www.scopus.com/inward/record.url?scp=85055677559&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85055677559&partnerID=8YFLogxK

U2 - 10.1109/LSP.2018.2878620

DO - 10.1109/LSP.2018.2878620

M3 - Article

JO - IEEE Signal Processing Letters

JF - IEEE Signal Processing Letters

SN - 1070-9908

ER -