A generalized construction of integrated speech recognition transducers

Cyril Allauzen, Mehryar Mohri, Michael Riley, Brian Roark

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We showed in previous work that weighted finite-state transducers provide a common representation for many components of a speech recognition system and described general algorithms for combining these representations to build a single optimized and compact transducer itegrating all these components, directly mapping from HMM states to words. This approach works well for certain well-controlled input transducers, but presents some problems related to the efficiency of composition and the applicability of determinization and weight-pushing with more general transducers. We generalize our prior construction of the integrated speech recognition transducer to work with an arbitrary number of component transducers and, to a large extent, release the constraints imposed to the type of input transducers by providing more general solutions to these problems. This generalization allowed us to deal with cases where our prior optimization did not apply. Our experiments in the AT&T HMIHY 0300 task and an AT&T VoiceTone task show the efficiency of our generalized optimization technique. We report a 1.6 recognition speed-up in the HMIHY 0300 task, 1.8 speed-up in a VoiceTone task using a word-based language model, and 1.7 using a class-based model.

Original languageEnglish (US)
Title of host publicationICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume1
StatePublished - 2004
EventProceedings - IEEE International Conference on Acoustics, Speech, and Signal Processing - Montreal, Que, Canada
Duration: May 17 2004May 21 2004

Other

OtherProceedings - IEEE International Conference on Acoustics, Speech, and Signal Processing
CountryCanada
CityMontreal, Que
Period5/17/045/21/04

Fingerprint

speech recognition
Speech recognition
Transducers
transducers
optimization
pushing
Chemical analysis

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Signal Processing
  • Acoustics and Ultrasonics

Cite this

Allauzen, C., Mohri, M., Riley, M., & Roark, B. (2004). A generalized construction of integrated speech recognition transducers. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (Vol. 1)

A generalized construction of integrated speech recognition transducers. / Allauzen, Cyril; Mohri, Mehryar; Riley, Michael; Roark, Brian.

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Vol. 1 2004.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Allauzen, C, Mohri, M, Riley, M & Roark, B 2004, A generalized construction of integrated speech recognition transducers. in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. vol. 1, Proceedings - IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, Que, Canada, 5/17/04.
Allauzen C, Mohri M, Riley M, Roark B. A generalized construction of integrated speech recognition transducers. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Vol. 1. 2004
Allauzen, Cyril ; Mohri, Mehryar ; Riley, Michael ; Roark, Brian. / A generalized construction of integrated speech recognition transducers. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Vol. 1 2004.
@inproceedings{a4fa5aebaa774304a2b6225fec7cb356,
title = "A generalized construction of integrated speech recognition transducers",
abstract = "We showed in previous work that weighted finite-state transducers provide a common representation for many components of a speech recognition system and described general algorithms for combining these representations to build a single optimized and compact transducer itegrating all these components, directly mapping from HMM states to words. This approach works well for certain well-controlled input transducers, but presents some problems related to the efficiency of composition and the applicability of determinization and weight-pushing with more general transducers. We generalize our prior construction of the integrated speech recognition transducer to work with an arbitrary number of component transducers and, to a large extent, release the constraints imposed to the type of input transducers by providing more general solutions to these problems. This generalization allowed us to deal with cases where our prior optimization did not apply. Our experiments in the AT&T HMIHY 0300 task and an AT&T VoiceTone task show the efficiency of our generalized optimization technique. We report a 1.6 recognition speed-up in the HMIHY 0300 task, 1.8 speed-up in a VoiceTone task using a word-based language model, and 1.7 using a class-based model.",
author = "Cyril Allauzen and Mehryar Mohri and Michael Riley and Brian Roark",
year = "2004",
language = "English (US)",
volume = "1",
booktitle = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

}

TY - GEN

T1 - A generalized construction of integrated speech recognition transducers

AU - Allauzen, Cyril

AU - Mohri, Mehryar

AU - Riley, Michael

AU - Roark, Brian

PY - 2004

Y1 - 2004

N2 - We showed in previous work that weighted finite-state transducers provide a common representation for many components of a speech recognition system and described general algorithms for combining these representations to build a single optimized and compact transducer itegrating all these components, directly mapping from HMM states to words. This approach works well for certain well-controlled input transducers, but presents some problems related to the efficiency of composition and the applicability of determinization and weight-pushing with more general transducers. We generalize our prior construction of the integrated speech recognition transducer to work with an arbitrary number of component transducers and, to a large extent, release the constraints imposed to the type of input transducers by providing more general solutions to these problems. This generalization allowed us to deal with cases where our prior optimization did not apply. Our experiments in the AT&T HMIHY 0300 task and an AT&T VoiceTone task show the efficiency of our generalized optimization technique. We report a 1.6 recognition speed-up in the HMIHY 0300 task, 1.8 speed-up in a VoiceTone task using a word-based language model, and 1.7 using a class-based model.

AB - We showed in previous work that weighted finite-state transducers provide a common representation for many components of a speech recognition system and described general algorithms for combining these representations to build a single optimized and compact transducer itegrating all these components, directly mapping from HMM states to words. This approach works well for certain well-controlled input transducers, but presents some problems related to the efficiency of composition and the applicability of determinization and weight-pushing with more general transducers. We generalize our prior construction of the integrated speech recognition transducer to work with an arbitrary number of component transducers and, to a large extent, release the constraints imposed to the type of input transducers by providing more general solutions to these problems. This generalization allowed us to deal with cases where our prior optimization did not apply. Our experiments in the AT&T HMIHY 0300 task and an AT&T VoiceTone task show the efficiency of our generalized optimization technique. We report a 1.6 recognition speed-up in the HMIHY 0300 task, 1.8 speed-up in a VoiceTone task using a word-based language model, and 1.7 using a class-based model.

UR - http://www.scopus.com/inward/record.url?scp=4544339437&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=4544339437&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:4544339437

VL - 1

BT - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

ER -