Multi-time resolution analysis of speech

Evidence from psychophysics

Maria Chait, Steven Greenberg, Takayuki Arai, Jonathan Z. Simon, David Poeppel

Research output: Contribution to journalArticle

Abstract

How speech signals are analyzed and represented remains a foundational challenge both for cognitive science and neuroscience. A growing body of research, employing various behavioral and neurobiological experimental techniques, now points to the perceptual relevance of both phoneme-sized (10-40 Hz modulation frequency) and syllable-sized (2-10 Hz modulation frequency) units in speech processing. However, it is not clear how information associated with such different time scales interacts in a manner relevant for speech perception. We report behavioral experiments on speech intelligibility employing a stimulus that allows us to investigate how distinct temporal modulations in speech are treated separately and whether they are combined. We created sentences in which the slow (~ 4Hz; S<inf>low</inf>) and rapid (~ 33Hz; S<inf>high</inf>) modulations - corresponding to ~250 ms and ~30 ms, the average duration of syllables and certain phonetic properties, respectively - were selectively extracted. Although S<inf>low</inf> and S<inf>high</inf> have low intelligibility when presented separately, dichotic presentation of S<inf>high</inf> with S<inf>low</inf> results in supra-additive performance, suggesting a synergistic relationship between low- and high-modulation frequencies. A second experiment desynchronized presentation of the S<inf>low</inf> + S<inf>high</inf> signals. Desynchronizing signals relative to one another had no impact on intelligibility when delays were less than ~45 ms. Longer delays resulted in a steep intelligibility decline, providing further evidence of integration or binding of information within restricted temporal windows. Our data suggest that human speech perception uses multi-time resolution processing. Signals are concurrently analyzed on at least two separate time scales, the intermediate representations of these analyses are integrated, and the resulting bound percept has significant consequences for speech intelligibility - a view compatible with recent insights from neuroscience implicating multi-timescale auditory processing.

Original languageEnglish (US)
Article number214
JournalFrontiers in Neuroscience
Volume9
Issue numberMAY
DOIs
StatePublished - 2015

Fingerprint

Psychophysics
Speech Intelligibility
Speech Perception
Cognitive Science
Phonetics
Neurosciences
Research

Keywords

  • Auditory processing
  • Modulation spectrum
  • Phoneme
  • Speech perception
  • Speech segmentation
  • Syllable
  • Temporal processing

ASJC Scopus subject areas

  • Neuroscience(all)

Cite this

Multi-time resolution analysis of speech : Evidence from psychophysics. / Chait, Maria; Greenberg, Steven; Arai, Takayuki; Simon, Jonathan Z.; Poeppel, David.

In: Frontiers in Neuroscience, Vol. 9, No. MAY, 214, 2015.

Research output: Contribution to journalArticle

Chait, Maria ; Greenberg, Steven ; Arai, Takayuki ; Simon, Jonathan Z. ; Poeppel, David. / Multi-time resolution analysis of speech : Evidence from psychophysics. In: Frontiers in Neuroscience. 2015 ; Vol. 9, No. MAY.
@article{0a3649ba9c824ad6bb2f60b17e4d453f,
title = "Multi-time resolution analysis of speech: Evidence from psychophysics",
abstract = "How speech signals are analyzed and represented remains a foundational challenge both for cognitive science and neuroscience. A growing body of research, employing various behavioral and neurobiological experimental techniques, now points to the perceptual relevance of both phoneme-sized (10-40 Hz modulation frequency) and syllable-sized (2-10 Hz modulation frequency) units in speech processing. However, it is not clear how information associated with such different time scales interacts in a manner relevant for speech perception. We report behavioral experiments on speech intelligibility employing a stimulus that allows us to investigate how distinct temporal modulations in speech are treated separately and whether they are combined. We created sentences in which the slow (~ 4Hz; Slow) and rapid (~ 33Hz; Shigh) modulations - corresponding to ~250 ms and ~30 ms, the average duration of syllables and certain phonetic properties, respectively - were selectively extracted. Although Slow and Shigh have low intelligibility when presented separately, dichotic presentation of Shigh with Slow results in supra-additive performance, suggesting a synergistic relationship between low- and high-modulation frequencies. A second experiment desynchronized presentation of the Slow + Shigh signals. Desynchronizing signals relative to one another had no impact on intelligibility when delays were less than ~45 ms. Longer delays resulted in a steep intelligibility decline, providing further evidence of integration or binding of information within restricted temporal windows. Our data suggest that human speech perception uses multi-time resolution processing. Signals are concurrently analyzed on at least two separate time scales, the intermediate representations of these analyses are integrated, and the resulting bound percept has significant consequences for speech intelligibility - a view compatible with recent insights from neuroscience implicating multi-timescale auditory processing.",
keywords = "Auditory processing, Modulation spectrum, Phoneme, Speech perception, Speech segmentation, Syllable, Temporal processing",
author = "Maria Chait and Steven Greenberg and Takayuki Arai and Simon, {Jonathan Z.} and David Poeppel",
year = "2015",
doi = "10.3389/fnins.2015.00214",
language = "English (US)",
volume = "9",
journal = "Frontiers in Neuroscience",
issn = "1662-4548",
publisher = "Frontiers Research Foundation",
number = "MAY",

}

TY - JOUR

T1 - Multi-time resolution analysis of speech

T2 - Evidence from psychophysics

AU - Chait, Maria

AU - Greenberg, Steven

AU - Arai, Takayuki

AU - Simon, Jonathan Z.

AU - Poeppel, David

PY - 2015

Y1 - 2015

N2 - How speech signals are analyzed and represented remains a foundational challenge both for cognitive science and neuroscience. A growing body of research, employing various behavioral and neurobiological experimental techniques, now points to the perceptual relevance of both phoneme-sized (10-40 Hz modulation frequency) and syllable-sized (2-10 Hz modulation frequency) units in speech processing. However, it is not clear how information associated with such different time scales interacts in a manner relevant for speech perception. We report behavioral experiments on speech intelligibility employing a stimulus that allows us to investigate how distinct temporal modulations in speech are treated separately and whether they are combined. We created sentences in which the slow (~ 4Hz; Slow) and rapid (~ 33Hz; Shigh) modulations - corresponding to ~250 ms and ~30 ms, the average duration of syllables and certain phonetic properties, respectively - were selectively extracted. Although Slow and Shigh have low intelligibility when presented separately, dichotic presentation of Shigh with Slow results in supra-additive performance, suggesting a synergistic relationship between low- and high-modulation frequencies. A second experiment desynchronized presentation of the Slow + Shigh signals. Desynchronizing signals relative to one another had no impact on intelligibility when delays were less than ~45 ms. Longer delays resulted in a steep intelligibility decline, providing further evidence of integration or binding of information within restricted temporal windows. Our data suggest that human speech perception uses multi-time resolution processing. Signals are concurrently analyzed on at least two separate time scales, the intermediate representations of these analyses are integrated, and the resulting bound percept has significant consequences for speech intelligibility - a view compatible with recent insights from neuroscience implicating multi-timescale auditory processing.

AB - How speech signals are analyzed and represented remains a foundational challenge both for cognitive science and neuroscience. A growing body of research, employing various behavioral and neurobiological experimental techniques, now points to the perceptual relevance of both phoneme-sized (10-40 Hz modulation frequency) and syllable-sized (2-10 Hz modulation frequency) units in speech processing. However, it is not clear how information associated with such different time scales interacts in a manner relevant for speech perception. We report behavioral experiments on speech intelligibility employing a stimulus that allows us to investigate how distinct temporal modulations in speech are treated separately and whether they are combined. We created sentences in which the slow (~ 4Hz; Slow) and rapid (~ 33Hz; Shigh) modulations - corresponding to ~250 ms and ~30 ms, the average duration of syllables and certain phonetic properties, respectively - were selectively extracted. Although Slow and Shigh have low intelligibility when presented separately, dichotic presentation of Shigh with Slow results in supra-additive performance, suggesting a synergistic relationship between low- and high-modulation frequencies. A second experiment desynchronized presentation of the Slow + Shigh signals. Desynchronizing signals relative to one another had no impact on intelligibility when delays were less than ~45 ms. Longer delays resulted in a steep intelligibility decline, providing further evidence of integration or binding of information within restricted temporal windows. Our data suggest that human speech perception uses multi-time resolution processing. Signals are concurrently analyzed on at least two separate time scales, the intermediate representations of these analyses are integrated, and the resulting bound percept has significant consequences for speech intelligibility - a view compatible with recent insights from neuroscience implicating multi-timescale auditory processing.

KW - Auditory processing

KW - Modulation spectrum

KW - Phoneme

KW - Speech perception

KW - Speech segmentation

KW - Syllable

KW - Temporal processing

UR - http://www.scopus.com/inward/record.url?scp=84930677314&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84930677314&partnerID=8YFLogxK

U2 - 10.3389/fnins.2015.00214

DO - 10.3389/fnins.2015.00214

M3 - Article

VL - 9

JO - Frontiers in Neuroscience

JF - Frontiers in Neuroscience

SN - 1662-4548

IS - MAY

M1 - 214

ER -