Improving deep neural network acoustic modeling for audio corpus indexing under The IARPA Babel program

Xiaodong Cui, Brian Kingsbury, Jia Cui, Bhuvana Ramabhadran, Andrew Rosenberg, Mohammad Sadegh Rasooli, Owen Rambow, Nizar Habash, Vaibhava Goel

Research output: Contribution to journalConference article

Abstract

This paper is focused on several techniques that improve deep neural network (DNN) acoustic modeling for audio corpus indexing in the context of the IARPA Babel program. Specifically, fundamental frequency variation (FFV) and channelaware (CA) features and data augmentation based on stochastic feature mapping (SFM) are investigated not only for improved automatic speech recognition (ASR) performance but also for their impact to the final spoken term detection on the pre-indexed audio corpus. Experimental results on development languages of Babel option period one show that the improved DNN acoustic models can reduce word error rates in ASR and also help the keyword search performance compared to already competitive DNN baseline systems.

Original languageEnglish (US)
Pages (from-to)2103-2107
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
StatePublished - Jan 1 2014
Event15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 - Singapore, Singapore
Duration: Sep 14 2014Sep 18 2014

Fingerprint

Indexing
Acoustics
Automatic Speech Recognition
Neural Networks
Speech recognition
Modeling
Data Augmentation
Acoustic Model
Keyword Search
Fundamental Frequency
Network Model
Error Rate
Baseline
Experimental Results
Term
Corpus
Deep neural networks
Babel
Context
Language

Keywords

  • Channel-aware
  • Data augmentation
  • Deep neural networks
  • Fundamental frequency variation
  • Stochastic feature mapping

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Cite this

Improving deep neural network acoustic modeling for audio corpus indexing under The IARPA Babel program. / Cui, Xiaodong; Kingsbury, Brian; Cui, Jia; Ramabhadran, Bhuvana; Rosenberg, Andrew; Rasooli, Mohammad Sadegh; Rambow, Owen; Habash, Nizar; Goel, Vaibhava.

In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 01.01.2014, p. 2103-2107.

Research output: Contribution to journalConference article

Cui, Xiaodong ; Kingsbury, Brian ; Cui, Jia ; Ramabhadran, Bhuvana ; Rosenberg, Andrew ; Rasooli, Mohammad Sadegh ; Rambow, Owen ; Habash, Nizar ; Goel, Vaibhava. / Improving deep neural network acoustic modeling for audio corpus indexing under The IARPA Babel program. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2014 ; pp. 2103-2107.
@article{44f5c944c7b247fe807af5a7dd45e6bd,
title = "Improving deep neural network acoustic modeling for audio corpus indexing under The IARPA Babel program",
abstract = "This paper is focused on several techniques that improve deep neural network (DNN) acoustic modeling for audio corpus indexing in the context of the IARPA Babel program. Specifically, fundamental frequency variation (FFV) and channelaware (CA) features and data augmentation based on stochastic feature mapping (SFM) are investigated not only for improved automatic speech recognition (ASR) performance but also for their impact to the final spoken term detection on the pre-indexed audio corpus. Experimental results on development languages of Babel option period one show that the improved DNN acoustic models can reduce word error rates in ASR and also help the keyword search performance compared to already competitive DNN baseline systems.",
keywords = "Channel-aware, Data augmentation, Deep neural networks, Fundamental frequency variation, Stochastic feature mapping",
author = "Xiaodong Cui and Brian Kingsbury and Jia Cui and Bhuvana Ramabhadran and Andrew Rosenberg and Rasooli, {Mohammad Sadegh} and Owen Rambow and Nizar Habash and Vaibhava Goel",
year = "2014",
month = "1",
day = "1",
language = "English (US)",
pages = "2103--2107",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - Improving deep neural network acoustic modeling for audio corpus indexing under The IARPA Babel program

AU - Cui, Xiaodong

AU - Kingsbury, Brian

AU - Cui, Jia

AU - Ramabhadran, Bhuvana

AU - Rosenberg, Andrew

AU - Rasooli, Mohammad Sadegh

AU - Rambow, Owen

AU - Habash, Nizar

AU - Goel, Vaibhava

PY - 2014/1/1

Y1 - 2014/1/1

N2 - This paper is focused on several techniques that improve deep neural network (DNN) acoustic modeling for audio corpus indexing in the context of the IARPA Babel program. Specifically, fundamental frequency variation (FFV) and channelaware (CA) features and data augmentation based on stochastic feature mapping (SFM) are investigated not only for improved automatic speech recognition (ASR) performance but also for their impact to the final spoken term detection on the pre-indexed audio corpus. Experimental results on development languages of Babel option period one show that the improved DNN acoustic models can reduce word error rates in ASR and also help the keyword search performance compared to already competitive DNN baseline systems.

AB - This paper is focused on several techniques that improve deep neural network (DNN) acoustic modeling for audio corpus indexing in the context of the IARPA Babel program. Specifically, fundamental frequency variation (FFV) and channelaware (CA) features and data augmentation based on stochastic feature mapping (SFM) are investigated not only for improved automatic speech recognition (ASR) performance but also for their impact to the final spoken term detection on the pre-indexed audio corpus. Experimental results on development languages of Babel option period one show that the improved DNN acoustic models can reduce word error rates in ASR and also help the keyword search performance compared to already competitive DNN baseline systems.

KW - Channel-aware

KW - Data augmentation

KW - Deep neural networks

KW - Fundamental frequency variation

KW - Stochastic feature mapping

UR - http://www.scopus.com/inward/record.url?scp=84910070307&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84910070307&partnerID=8YFLogxK

M3 - Conference article

SP - 2103

EP - 2107

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -