Speech fine structure contains critical temporal cues to support speech segmentation

Xiangbin Teng, Gregory B. Cogan, David Poeppel

Research output: Contribution to journalArticle

Abstract

Segmenting the continuous speech stream into units for further perceptual and linguistic analyses is fundamental to speech recognition. The speech amplitude envelope (SE) has long been considered a fundamental temporal cue for segmenting speech. Does the temporal fine structure (TFS), a significant part of speech signals often considered to contain primarily spectral information, contribute to speech segmentation? Using magnetoencephalography, we show that the TFS entrains cortical responses between 3 and 6 Hz and demonstrate, using mutual information analysis, that (i) the temporal information in the TFS can be reconstructed from a measure of frame-to-frame spectral change and correlates with the SE and (ii) that spectral resolution is key to the extraction of such temporal information. Furthermore, we show behavioural evidence that, when the SE is temporally distorted, the TFS provides cues for speech segmentation and aids speech recognition significantly. Our findings show that it is insufficient to investigate solely the SE to understand temporal speech segmentation, as the SE and the TFS derived from a band-filtering method convey comparable, if not inseparable, temporal information. We argue for a more synthetic view of speech segmentation – the auditory system groups speech signals coherently in both temporal and spectral domains.

Original languageEnglish (US)
Article number116152
JournalNeuroImage
Volume202
DOIs
StatePublished - Nov 15 2019

Fingerprint

Cues
Magnetoencephalography
Linguistics

Keywords

  • Cortical entrainment
  • Spectral correlation
  • Spectro-temporal
  • Speech segmentation

ASJC Scopus subject areas

  • Neurology
  • Cognitive Neuroscience

Cite this

Speech fine structure contains critical temporal cues to support speech segmentation. / Teng, Xiangbin; Cogan, Gregory B.; Poeppel, David.

In: NeuroImage, Vol. 202, 116152, 15.11.2019.

Research output: Contribution to journalArticle

@article{ce022176845c4e6c9a49da410262309c,
title = "Speech fine structure contains critical temporal cues to support speech segmentation",
abstract = "Segmenting the continuous speech stream into units for further perceptual and linguistic analyses is fundamental to speech recognition. The speech amplitude envelope (SE) has long been considered a fundamental temporal cue for segmenting speech. Does the temporal fine structure (TFS), a significant part of speech signals often considered to contain primarily spectral information, contribute to speech segmentation? Using magnetoencephalography, we show that the TFS entrains cortical responses between 3 and 6 Hz and demonstrate, using mutual information analysis, that (i) the temporal information in the TFS can be reconstructed from a measure of frame-to-frame spectral change and correlates with the SE and (ii) that spectral resolution is key to the extraction of such temporal information. Furthermore, we show behavioural evidence that, when the SE is temporally distorted, the TFS provides cues for speech segmentation and aids speech recognition significantly. Our findings show that it is insufficient to investigate solely the SE to understand temporal speech segmentation, as the SE and the TFS derived from a band-filtering method convey comparable, if not inseparable, temporal information. We argue for a more synthetic view of speech segmentation – the auditory system groups speech signals coherently in both temporal and spectral domains.",
keywords = "Cortical entrainment, Spectral correlation, Spectro-temporal, Speech segmentation",
author = "Xiangbin Teng and Cogan, {Gregory B.} and David Poeppel",
year = "2019",
month = "11",
day = "15",
doi = "10.1016/j.neuroimage.2019.116152",
language = "English (US)",
volume = "202",
journal = "NeuroImage",
issn = "1053-8119",
publisher = "Academic Press Inc.",

}

TY - JOUR

T1 - Speech fine structure contains critical temporal cues to support speech segmentation

AU - Teng, Xiangbin

AU - Cogan, Gregory B.

AU - Poeppel, David

PY - 2019/11/15

Y1 - 2019/11/15

N2 - Segmenting the continuous speech stream into units for further perceptual and linguistic analyses is fundamental to speech recognition. The speech amplitude envelope (SE) has long been considered a fundamental temporal cue for segmenting speech. Does the temporal fine structure (TFS), a significant part of speech signals often considered to contain primarily spectral information, contribute to speech segmentation? Using magnetoencephalography, we show that the TFS entrains cortical responses between 3 and 6 Hz and demonstrate, using mutual information analysis, that (i) the temporal information in the TFS can be reconstructed from a measure of frame-to-frame spectral change and correlates with the SE and (ii) that spectral resolution is key to the extraction of such temporal information. Furthermore, we show behavioural evidence that, when the SE is temporally distorted, the TFS provides cues for speech segmentation and aids speech recognition significantly. Our findings show that it is insufficient to investigate solely the SE to understand temporal speech segmentation, as the SE and the TFS derived from a band-filtering method convey comparable, if not inseparable, temporal information. We argue for a more synthetic view of speech segmentation – the auditory system groups speech signals coherently in both temporal and spectral domains.

AB - Segmenting the continuous speech stream into units for further perceptual and linguistic analyses is fundamental to speech recognition. The speech amplitude envelope (SE) has long been considered a fundamental temporal cue for segmenting speech. Does the temporal fine structure (TFS), a significant part of speech signals often considered to contain primarily spectral information, contribute to speech segmentation? Using magnetoencephalography, we show that the TFS entrains cortical responses between 3 and 6 Hz and demonstrate, using mutual information analysis, that (i) the temporal information in the TFS can be reconstructed from a measure of frame-to-frame spectral change and correlates with the SE and (ii) that spectral resolution is key to the extraction of such temporal information. Furthermore, we show behavioural evidence that, when the SE is temporally distorted, the TFS provides cues for speech segmentation and aids speech recognition significantly. Our findings show that it is insufficient to investigate solely the SE to understand temporal speech segmentation, as the SE and the TFS derived from a band-filtering method convey comparable, if not inseparable, temporal information. We argue for a more synthetic view of speech segmentation – the auditory system groups speech signals coherently in both temporal and spectral domains.

KW - Cortical entrainment

KW - Spectral correlation

KW - Spectro-temporal

KW - Speech segmentation

UR - http://www.scopus.com/inward/record.url?scp=85072030573&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85072030573&partnerID=8YFLogxK

U2 - 10.1016/j.neuroimage.2019.116152

DO - 10.1016/j.neuroimage.2019.116152

M3 - Article

C2 - 31484039

AN - SCOPUS:85072030573

VL - 202

JO - NeuroImage

JF - NeuroImage

SN - 1053-8119

M1 - 116152

ER -