ALTIS: A new algorithm for adaptive long-term SNR estimation in multi-talker babble

Roozbeh Soleymani, Ivan Selesnick, David M. Landsberger

Research output: Contribution to journalArticle

Abstract

We introduce a real-time capable algorithm which estimates the long-term signal to noise ratio (SNR) of the speech in multi-talker babble noise. In real-time applications, long-term SNR is calculated over a sufficiently long moving frame of the noisy speech ending at the current time. The algorithm performs the real-time long-term SNR estimation by averaging “speech-likeness” values of multiple consecutive short-frames of the noisy speech which collectively form a long-frame with an adaptive length. The algorithm is calibrated to be insensitive to short-term fluctuations and transient changes in speech or noise level. However, it quickly responds to non-transient changes in long-term SNR by adjusting the duration of the long-frame on which the long-term SNR is measured. This ability is obtained by employing an event detector and adaptive frame duration. The event detector identifies non-transient changes of the long-term SNR and optimizes the duration of the long-frame accordingly. The algorithm was trained and tested for randomly generated speech samples corrupted with multi-talker babble. In addition to its ability to provide an adaptive long-term SNR estimation in a dynamic noisy situation, the evaluation results show that the algorithm outperforms the existing overall SNR estimation methods in multi-talker babble over a wide range of number of talkers and SNRs. The relatively low computational cost and the ability to update the estimated long-term SNR several times per second make this algorithm capable of operating in real-time speech processing applications.

Original languageEnglish (US)
Pages (from-to)231-246
Number of pages16
JournalComputer Speech and Language
Volume58
DOIs
StatePublished - Nov 1 2019

Fingerprint

Signal to noise ratio
Real-time
Detector
Detectors
Speech Processing
Speech processing
Moving Frame
Averaging
Speech
Computational Cost
Consecutive
Update
Optimise
Fluctuations
Evaluation
Estimate
Range of data
Costs

Keywords

  • Adaptive SNR
  • Long-term SNR
  • Multi-talker babble
  • Real-time SNR
  • Signal-to-noise ratio

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Human-Computer Interaction

Cite this

ALTIS : A new algorithm for adaptive long-term SNR estimation in multi-talker babble. / Soleymani, Roozbeh; Selesnick, Ivan; Landsberger, David M.

In: Computer Speech and Language, Vol. 58, 01.11.2019, p. 231-246.

Research output: Contribution to journalArticle

@article{94c8c2cec4ca47599c058124432d14d6,
title = "ALTIS: A new algorithm for adaptive long-term SNR estimation in multi-talker babble",
abstract = "We introduce a real-time capable algorithm which estimates the long-term signal to noise ratio (SNR) of the speech in multi-talker babble noise. In real-time applications, long-term SNR is calculated over a sufficiently long moving frame of the noisy speech ending at the current time. The algorithm performs the real-time long-term SNR estimation by averaging “speech-likeness” values of multiple consecutive short-frames of the noisy speech which collectively form a long-frame with an adaptive length. The algorithm is calibrated to be insensitive to short-term fluctuations and transient changes in speech or noise level. However, it quickly responds to non-transient changes in long-term SNR by adjusting the duration of the long-frame on which the long-term SNR is measured. This ability is obtained by employing an event detector and adaptive frame duration. The event detector identifies non-transient changes of the long-term SNR and optimizes the duration of the long-frame accordingly. The algorithm was trained and tested for randomly generated speech samples corrupted with multi-talker babble. In addition to its ability to provide an adaptive long-term SNR estimation in a dynamic noisy situation, the evaluation results show that the algorithm outperforms the existing overall SNR estimation methods in multi-talker babble over a wide range of number of talkers and SNRs. The relatively low computational cost and the ability to update the estimated long-term SNR several times per second make this algorithm capable of operating in real-time speech processing applications.",
keywords = "Adaptive SNR, Long-term SNR, Multi-talker babble, Real-time SNR, Signal-to-noise ratio",
author = "Roozbeh Soleymani and Ivan Selesnick and Landsberger, {David M.}",
year = "2019",
month = "11",
day = "1",
doi = "10.1016/j.csl.2019.05.001",
language = "English (US)",
volume = "58",
pages = "231--246",
journal = "Computer Speech and Language",
issn = "0885-2308",
publisher = "Academic Press Inc.",

}

TY - JOUR

T1 - ALTIS

T2 - A new algorithm for adaptive long-term SNR estimation in multi-talker babble

AU - Soleymani, Roozbeh

AU - Selesnick, Ivan

AU - Landsberger, David M.

PY - 2019/11/1

Y1 - 2019/11/1

N2 - We introduce a real-time capable algorithm which estimates the long-term signal to noise ratio (SNR) of the speech in multi-talker babble noise. In real-time applications, long-term SNR is calculated over a sufficiently long moving frame of the noisy speech ending at the current time. The algorithm performs the real-time long-term SNR estimation by averaging “speech-likeness” values of multiple consecutive short-frames of the noisy speech which collectively form a long-frame with an adaptive length. The algorithm is calibrated to be insensitive to short-term fluctuations and transient changes in speech or noise level. However, it quickly responds to non-transient changes in long-term SNR by adjusting the duration of the long-frame on which the long-term SNR is measured. This ability is obtained by employing an event detector and adaptive frame duration. The event detector identifies non-transient changes of the long-term SNR and optimizes the duration of the long-frame accordingly. The algorithm was trained and tested for randomly generated speech samples corrupted with multi-talker babble. In addition to its ability to provide an adaptive long-term SNR estimation in a dynamic noisy situation, the evaluation results show that the algorithm outperforms the existing overall SNR estimation methods in multi-talker babble over a wide range of number of talkers and SNRs. The relatively low computational cost and the ability to update the estimated long-term SNR several times per second make this algorithm capable of operating in real-time speech processing applications.

AB - We introduce a real-time capable algorithm which estimates the long-term signal to noise ratio (SNR) of the speech in multi-talker babble noise. In real-time applications, long-term SNR is calculated over a sufficiently long moving frame of the noisy speech ending at the current time. The algorithm performs the real-time long-term SNR estimation by averaging “speech-likeness” values of multiple consecutive short-frames of the noisy speech which collectively form a long-frame with an adaptive length. The algorithm is calibrated to be insensitive to short-term fluctuations and transient changes in speech or noise level. However, it quickly responds to non-transient changes in long-term SNR by adjusting the duration of the long-frame on which the long-term SNR is measured. This ability is obtained by employing an event detector and adaptive frame duration. The event detector identifies non-transient changes of the long-term SNR and optimizes the duration of the long-frame accordingly. The algorithm was trained and tested for randomly generated speech samples corrupted with multi-talker babble. In addition to its ability to provide an adaptive long-term SNR estimation in a dynamic noisy situation, the evaluation results show that the algorithm outperforms the existing overall SNR estimation methods in multi-talker babble over a wide range of number of talkers and SNRs. The relatively low computational cost and the ability to update the estimated long-term SNR several times per second make this algorithm capable of operating in real-time speech processing applications.

KW - Adaptive SNR

KW - Long-term SNR

KW - Multi-talker babble

KW - Real-time SNR

KW - Signal-to-noise ratio

UR - http://www.scopus.com/inward/record.url?scp=85065893448&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85065893448&partnerID=8YFLogxK

U2 - 10.1016/j.csl.2019.05.001

DO - 10.1016/j.csl.2019.05.001

M3 - Article

AN - SCOPUS:85065893448

VL - 58

SP - 231

EP - 246

JO - Computer Speech and Language

JF - Computer Speech and Language

SN - 0885-2308

ER -