Speech segmentation and spoken document processing

Mari Ostendorf, Benoit Favre, Ralph Grishman, Dilek Hakkani-Tür, Mary Harper, Dustin Hillard, Julia Hirschberg, Heng Ji, Jeremy G. Kahn, Yang Liu, Sameer Maskey, Evgeny Matusov, Hermann Ney, Andrew Rosenberg, Elizabeth Shriberg, Wen Wang, Chuck Wooters

Research output: Contribution to journalArticle

Abstract

The concept of speech segmentation has many levels and are useful for improving automatic speech recognition (ASR) technology. There has been progress as well in the field of sentence segmentation with the combination of lexical information from a word recognizer, with spectral and prosodic cues. In addition, sentence segmentation is relevant for speech understanding applications especially for parsing and information extraction (IE) as well as machine translation, summarization and question answering at the application level. For a segmentation algorithm to work, audio diarization and structural segmentation are used. Audio diarization's goal is to segment an audio recording into acoustically homogeneous regions, given only features extracted from the audio signal. Another is speaker diarization, which involves computing a generalized log likelihood ratio at candidate boundaries. Structural segmentation has the goal of detecting boundary events and whole constituent modeling, and both are applicable for speech recognition because they exploit the alignment between words.

Original languageEnglish (US)
Pages (from-to)59-69
Number of pages11
JournalIEEE Signal Processing Magazine
Volume25
Issue number3
DOIs
StatePublished - 2008

Fingerprint

Speech recognition
Segmentation
Audio recordings
Processing
Log-likelihood Ratio
Machine Translation
Automatic Speech Recognition
Question Answering
Information Extraction
Summarization
Parsing
Speech Recognition
Speech
Alignment
Computing
Modeling

ASJC Scopus subject areas

  • Signal Processing
  • Electrical and Electronic Engineering
  • Applied Mathematics

Cite this

Ostendorf, M., Favre, B., Grishman, R., Hakkani-Tür, D., Harper, M., Hillard, D., ... Wooters, C. (2008). Speech segmentation and spoken document processing. IEEE Signal Processing Magazine, 25(3), 59-69. https://doi.org/10.1109/MSP.2008.918023

Speech segmentation and spoken document processing. / Ostendorf, Mari; Favre, Benoit; Grishman, Ralph; Hakkani-Tür, Dilek; Harper, Mary; Hillard, Dustin; Hirschberg, Julia; Ji, Heng; Kahn, Jeremy G.; Liu, Yang; Maskey, Sameer; Matusov, Evgeny; Ney, Hermann; Rosenberg, Andrew; Shriberg, Elizabeth; Wang, Wen; Wooters, Chuck.

In: IEEE Signal Processing Magazine, Vol. 25, No. 3, 2008, p. 59-69.

Research output: Contribution to journalArticle

Ostendorf, M, Favre, B, Grishman, R, Hakkani-Tür, D, Harper, M, Hillard, D, Hirschberg, J, Ji, H, Kahn, JG, Liu, Y, Maskey, S, Matusov, E, Ney, H, Rosenberg, A, Shriberg, E, Wang, W & Wooters, C 2008, 'Speech segmentation and spoken document processing', IEEE Signal Processing Magazine, vol. 25, no. 3, pp. 59-69. https://doi.org/10.1109/MSP.2008.918023
Ostendorf, Mari ; Favre, Benoit ; Grishman, Ralph ; Hakkani-Tür, Dilek ; Harper, Mary ; Hillard, Dustin ; Hirschberg, Julia ; Ji, Heng ; Kahn, Jeremy G. ; Liu, Yang ; Maskey, Sameer ; Matusov, Evgeny ; Ney, Hermann ; Rosenberg, Andrew ; Shriberg, Elizabeth ; Wang, Wen ; Wooters, Chuck. / Speech segmentation and spoken document processing. In: IEEE Signal Processing Magazine. 2008 ; Vol. 25, No. 3. pp. 59-69.
@article{71daf77ce22c41fc84a2c2ce5b837d07,
title = "Speech segmentation and spoken document processing",
abstract = "The concept of speech segmentation has many levels and are useful for improving automatic speech recognition (ASR) technology. There has been progress as well in the field of sentence segmentation with the combination of lexical information from a word recognizer, with spectral and prosodic cues. In addition, sentence segmentation is relevant for speech understanding applications especially for parsing and information extraction (IE) as well as machine translation, summarization and question answering at the application level. For a segmentation algorithm to work, audio diarization and structural segmentation are used. Audio diarization's goal is to segment an audio recording into acoustically homogeneous regions, given only features extracted from the audio signal. Another is speaker diarization, which involves computing a generalized log likelihood ratio at candidate boundaries. Structural segmentation has the goal of detecting boundary events and whole constituent modeling, and both are applicable for speech recognition because they exploit the alignment between words.",
author = "Mari Ostendorf and Benoit Favre and Ralph Grishman and Dilek Hakkani-T{\"u}r and Mary Harper and Dustin Hillard and Julia Hirschberg and Heng Ji and Kahn, {Jeremy G.} and Yang Liu and Sameer Maskey and Evgeny Matusov and Hermann Ney and Andrew Rosenberg and Elizabeth Shriberg and Wen Wang and Chuck Wooters",
year = "2008",
doi = "10.1109/MSP.2008.918023",
language = "English (US)",
volume = "25",
pages = "59--69",
journal = "IEEE Signal Processing Magazine",
issn = "1053-5888",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "3",

}

TY - JOUR

T1 - Speech segmentation and spoken document processing

AU - Ostendorf, Mari

AU - Favre, Benoit

AU - Grishman, Ralph

AU - Hakkani-Tür, Dilek

AU - Harper, Mary

AU - Hillard, Dustin

AU - Hirschberg, Julia

AU - Ji, Heng

AU - Kahn, Jeremy G.

AU - Liu, Yang

AU - Maskey, Sameer

AU - Matusov, Evgeny

AU - Ney, Hermann

AU - Rosenberg, Andrew

AU - Shriberg, Elizabeth

AU - Wang, Wen

AU - Wooters, Chuck

PY - 2008

Y1 - 2008

N2 - The concept of speech segmentation has many levels and are useful for improving automatic speech recognition (ASR) technology. There has been progress as well in the field of sentence segmentation with the combination of lexical information from a word recognizer, with spectral and prosodic cues. In addition, sentence segmentation is relevant for speech understanding applications especially for parsing and information extraction (IE) as well as machine translation, summarization and question answering at the application level. For a segmentation algorithm to work, audio diarization and structural segmentation are used. Audio diarization's goal is to segment an audio recording into acoustically homogeneous regions, given only features extracted from the audio signal. Another is speaker diarization, which involves computing a generalized log likelihood ratio at candidate boundaries. Structural segmentation has the goal of detecting boundary events and whole constituent modeling, and both are applicable for speech recognition because they exploit the alignment between words.

AB - The concept of speech segmentation has many levels and are useful for improving automatic speech recognition (ASR) technology. There has been progress as well in the field of sentence segmentation with the combination of lexical information from a word recognizer, with spectral and prosodic cues. In addition, sentence segmentation is relevant for speech understanding applications especially for parsing and information extraction (IE) as well as machine translation, summarization and question answering at the application level. For a segmentation algorithm to work, audio diarization and structural segmentation are used. Audio diarization's goal is to segment an audio recording into acoustically homogeneous regions, given only features extracted from the audio signal. Another is speaker diarization, which involves computing a generalized log likelihood ratio at candidate boundaries. Structural segmentation has the goal of detecting boundary events and whole constituent modeling, and both are applicable for speech recognition because they exploit the alignment between words.

UR - http://www.scopus.com/inward/record.url?scp=85032751513&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85032751513&partnerID=8YFLogxK

U2 - 10.1109/MSP.2008.918023

DO - 10.1109/MSP.2008.918023

M3 - Article

AN - SCOPUS:85032751513

VL - 25

SP - 59

EP - 69

JO - IEEE Signal Processing Magazine

JF - IEEE Signal Processing Magazine

SN - 1053-5888

IS - 3

ER -