An optimal pre-determinization algorithm for weighted transducers

Cyril Allauzen, Mehryar Mohri

Research output: Contribution to journalArticle

Abstract

We present a general algorithm, pre-determinization, that makes an arbitrary weighted transducer over the tropical semiring or an arbitrary unambiguous weighted transducer over a cancellative commutative semiring determinizable by inserting in it transitions labeled with special symbols. After determinization, the special symbols can be removed or replaced with e-transitions. The resulting transducer can be significantly more efficient to use. We report empirical results showing that our algorithm leads to a substantial speed-up in large-vocabulary speech recognition. Our pre-determinization algorithm makes use of an efficient algorithm for testing a general twins property, a sufficient condition for the determinizability of all weighted transducers over the tropical semiring and unambiguous weighted transducers over cancellative commutative semirings. Based on the transitions marked by this test of the twins property, our pre-determinization algorithm inserts new transitions just when needed to guarantee that the resulting transducer has the twins property and thus is determinizable. It also uses a single-source shortest-paths algorithm over the min-max semiring for carefully selecting the positions for insertion of new transitions to benefit from the subsequent application of determinization. These positions are proved to be optimal in a sense that we describe.

Original languageEnglish (US)
Pages (from-to)3-18
Number of pages16
JournalTheoretical Computer Science
Volume328
Issue number1-2
DOIs
StatePublished - Nov 29 2004

Keywords

  • Determinization
  • Finite automata
  • Finite-state transducers
  • Twins property
  • Weighted finite-state transducers

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'An optimal pre-determinization algorithm for weighted transducers'. Together they form a unique fingerprint.

  • Cite this