Causal inference of asynchronous audiovisual speech

John F. Magnotti, Wei Ji Ma, Michael S. Beauchamp

Research output: Contribution to journalArticle

Abstract

During speech perception, humans integrate auditory information from the voice with visual information from the face. This multisensory integration increases perceptual precision, but only if the two cues come from the same talker; this requirement has been largely ignored by current models of speech perception. We describe a generative model of multisensory speech perception that includes this critical step of determining the likelihood that the voice and face information have a common cause. A key feature of the model is that it is based on a principled analysis of how an observer should solve this causal inference problem using the asynchrony between two cues and the reliability of the cues. This allows the model to make predictions about the behavior of subjects performing a synchrony judgment task, predictive power that does not exist in other approaches, such as post-hoc fitting of Gaussian curves to behavioral data. We tested the model predictions against the performance of 37 subjects performing a synchrony judgment task viewing audiovisual speech under a variety of manipulations, including varying asynchronies, intelligibility, and visual cue reliability. The causal inference model outperformed the Gaussian model across two experiments, providing a better fit to the behavioral data with fewer parameters. Because the causal inference model is derived from a principled understanding of the task, model parameters are directly interpretable in terms of stimulus and subject properties.

Original languageEnglish (US)
Article numberArticle 798
JournalFrontiers in Psychology
Volume4
Issue numberNOV
DOIs
StatePublished - 2013

Fingerprint

Speech Perception
Cues

Keywords

  • Bayesian observer
  • Causal inference
  • Multisensory integration
  • Speech perception
  • Synchrony judgments

ASJC Scopus subject areas

  • Psychology(all)

Cite this

Causal inference of asynchronous audiovisual speech. / Magnotti, John F.; Ma, Wei Ji; Beauchamp, Michael S.

In: Frontiers in Psychology, Vol. 4, No. NOV, Article 798, 2013.

Research output: Contribution to journalArticle

Magnotti, John F. ; Ma, Wei Ji ; Beauchamp, Michael S. / Causal inference of asynchronous audiovisual speech. In: Frontiers in Psychology. 2013 ; Vol. 4, No. NOV.
@article{b0658c6f4c8e4113989138fcc491f24b,
title = "Causal inference of asynchronous audiovisual speech",
abstract = "During speech perception, humans integrate auditory information from the voice with visual information from the face. This multisensory integration increases perceptual precision, but only if the two cues come from the same talker; this requirement has been largely ignored by current models of speech perception. We describe a generative model of multisensory speech perception that includes this critical step of determining the likelihood that the voice and face information have a common cause. A key feature of the model is that it is based on a principled analysis of how an observer should solve this causal inference problem using the asynchrony between two cues and the reliability of the cues. This allows the model to make predictions about the behavior of subjects performing a synchrony judgment task, predictive power that does not exist in other approaches, such as post-hoc fitting of Gaussian curves to behavioral data. We tested the model predictions against the performance of 37 subjects performing a synchrony judgment task viewing audiovisual speech under a variety of manipulations, including varying asynchronies, intelligibility, and visual cue reliability. The causal inference model outperformed the Gaussian model across two experiments, providing a better fit to the behavioral data with fewer parameters. Because the causal inference model is derived from a principled understanding of the task, model parameters are directly interpretable in terms of stimulus and subject properties.",
keywords = "Bayesian observer, Causal inference, Multisensory integration, Speech perception, Synchrony judgments",
author = "Magnotti, {John F.} and Ma, {Wei Ji} and Beauchamp, {Michael S.}",
year = "2013",
doi = "10.3389/fpsyg.2013.00798",
language = "English (US)",
volume = "4",
journal = "Frontiers in Psychology",
issn = "1664-1078",
publisher = "Frontiers Media S. A.",
number = "NOV",

}

TY - JOUR

T1 - Causal inference of asynchronous audiovisual speech

AU - Magnotti, John F.

AU - Ma, Wei Ji

AU - Beauchamp, Michael S.

PY - 2013

Y1 - 2013

N2 - During speech perception, humans integrate auditory information from the voice with visual information from the face. This multisensory integration increases perceptual precision, but only if the two cues come from the same talker; this requirement has been largely ignored by current models of speech perception. We describe a generative model of multisensory speech perception that includes this critical step of determining the likelihood that the voice and face information have a common cause. A key feature of the model is that it is based on a principled analysis of how an observer should solve this causal inference problem using the asynchrony between two cues and the reliability of the cues. This allows the model to make predictions about the behavior of subjects performing a synchrony judgment task, predictive power that does not exist in other approaches, such as post-hoc fitting of Gaussian curves to behavioral data. We tested the model predictions against the performance of 37 subjects performing a synchrony judgment task viewing audiovisual speech under a variety of manipulations, including varying asynchronies, intelligibility, and visual cue reliability. The causal inference model outperformed the Gaussian model across two experiments, providing a better fit to the behavioral data with fewer parameters. Because the causal inference model is derived from a principled understanding of the task, model parameters are directly interpretable in terms of stimulus and subject properties.

AB - During speech perception, humans integrate auditory information from the voice with visual information from the face. This multisensory integration increases perceptual precision, but only if the two cues come from the same talker; this requirement has been largely ignored by current models of speech perception. We describe a generative model of multisensory speech perception that includes this critical step of determining the likelihood that the voice and face information have a common cause. A key feature of the model is that it is based on a principled analysis of how an observer should solve this causal inference problem using the asynchrony between two cues and the reliability of the cues. This allows the model to make predictions about the behavior of subjects performing a synchrony judgment task, predictive power that does not exist in other approaches, such as post-hoc fitting of Gaussian curves to behavioral data. We tested the model predictions against the performance of 37 subjects performing a synchrony judgment task viewing audiovisual speech under a variety of manipulations, including varying asynchronies, intelligibility, and visual cue reliability. The causal inference model outperformed the Gaussian model across two experiments, providing a better fit to the behavioral data with fewer parameters. Because the causal inference model is derived from a principled understanding of the task, model parameters are directly interpretable in terms of stimulus and subject properties.

KW - Bayesian observer

KW - Causal inference

KW - Multisensory integration

KW - Speech perception

KW - Synchrony judgments

UR - http://www.scopus.com/inward/record.url?scp=84889650789&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84889650789&partnerID=8YFLogxK

U2 - 10.3389/fpsyg.2013.00798

DO - 10.3389/fpsyg.2013.00798

M3 - Article

AN - SCOPUS:84889650789

VL - 4

JO - Frontiers in Psychology

JF - Frontiers in Psychology

SN - 1664-1078

IS - NOV

M1 - Article 798

ER -