A comparison of classifiers for detecting emotion from speech

Izhak Shafran, Mehryar Mohri

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Accurate detection of emotion from speech has clear benefits for the design of more natural human-machine speech interfaces or for the extraction of useful information from large quantities of speech data. The task consists of assigning, out of a fixed set, an emotion category, e.g., anger, fear, or satisfaction, to a speech utterance. In recent work, several classifiers have been proposed for automatic detection of a speaker's emotion using spoken words as the input. These classifiers were designed independently and tested on separate corpora, making it difficult to compare their performance. This paper presents three classifiers, two popular classifiers from the literature modeling the word content via n-gram sequences, one based on an interpolated language model, another on a mutual information-based feature-selection approach, and compares them with a discriminant kernel-based technique that we recently adopted. We have implemented these three classification algorithms and evaluated their performance by applying them to a corpus collected from a spoken-dialog system that was widely deployed across the US. The results show that our kernel-based classifier achieves an accuracy of 80.6%, and out-performs both the interpolated language model classifier, which achieved a classification accuracy of 70.1%, and the classifier using mutual information-based feature selection (78.8%).

Original languageEnglish (US)
Title of host publication2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing
PublisherInstitute of Electrical and Electronics Engineers Inc.
VolumeI
ISBN (Print)0780388747, 9780780388741
DOIs
StatePublished - 2005
Event2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Philadelphia, PA, United States
Duration: Mar 18 2005Mar 23 2005

Other

Other2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05
CountryUnited States
CityPhiladelphia, PA
Period3/18/053/23/05

Fingerprint

emotions
classifiers
Classifiers
Feature extraction
fear

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Signal Processing
  • Acoustics and Ultrasonics

Cite this

Shafran, I., & Mohri, M. (2005). A comparison of classifiers for detecting emotion from speech. In 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing (Vol. I). [1415120] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2005.1415120

A comparison of classifiers for detecting emotion from speech. / Shafran, Izhak; Mohri, Mehryar.

2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing. Vol. I Institute of Electrical and Electronics Engineers Inc., 2005. 1415120.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Shafran, I & Mohri, M 2005, A comparison of classifiers for detecting emotion from speech. in 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing. vol. I, 1415120, Institute of Electrical and Electronics Engineers Inc., 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05, Philadelphia, PA, United States, 3/18/05. https://doi.org/10.1109/ICASSP.2005.1415120
Shafran I, Mohri M. A comparison of classifiers for detecting emotion from speech. In 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing. Vol. I. Institute of Electrical and Electronics Engineers Inc. 2005. 1415120 https://doi.org/10.1109/ICASSP.2005.1415120
Shafran, Izhak ; Mohri, Mehryar. / A comparison of classifiers for detecting emotion from speech. 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing. Vol. I Institute of Electrical and Electronics Engineers Inc., 2005.
@inproceedings{7e959a4c7a6a4b779c029ca24048976a,
title = "A comparison of classifiers for detecting emotion from speech",
abstract = "Accurate detection of emotion from speech has clear benefits for the design of more natural human-machine speech interfaces or for the extraction of useful information from large quantities of speech data. The task consists of assigning, out of a fixed set, an emotion category, e.g., anger, fear, or satisfaction, to a speech utterance. In recent work, several classifiers have been proposed for automatic detection of a speaker's emotion using spoken words as the input. These classifiers were designed independently and tested on separate corpora, making it difficult to compare their performance. This paper presents three classifiers, two popular classifiers from the literature modeling the word content via n-gram sequences, one based on an interpolated language model, another on a mutual information-based feature-selection approach, and compares them with a discriminant kernel-based technique that we recently adopted. We have implemented these three classification algorithms and evaluated their performance by applying them to a corpus collected from a spoken-dialog system that was widely deployed across the US. The results show that our kernel-based classifier achieves an accuracy of 80.6{\%}, and out-performs both the interpolated language model classifier, which achieved a classification accuracy of 70.1{\%}, and the classifier using mutual information-based feature selection (78.8{\%}).",
author = "Izhak Shafran and Mehryar Mohri",
year = "2005",
doi = "10.1109/ICASSP.2005.1415120",
language = "English (US)",
isbn = "0780388747",
volume = "I",
booktitle = "2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - A comparison of classifiers for detecting emotion from speech

AU - Shafran, Izhak

AU - Mohri, Mehryar

PY - 2005

Y1 - 2005

N2 - Accurate detection of emotion from speech has clear benefits for the design of more natural human-machine speech interfaces or for the extraction of useful information from large quantities of speech data. The task consists of assigning, out of a fixed set, an emotion category, e.g., anger, fear, or satisfaction, to a speech utterance. In recent work, several classifiers have been proposed for automatic detection of a speaker's emotion using spoken words as the input. These classifiers were designed independently and tested on separate corpora, making it difficult to compare their performance. This paper presents three classifiers, two popular classifiers from the literature modeling the word content via n-gram sequences, one based on an interpolated language model, another on a mutual information-based feature-selection approach, and compares them with a discriminant kernel-based technique that we recently adopted. We have implemented these three classification algorithms and evaluated their performance by applying them to a corpus collected from a spoken-dialog system that was widely deployed across the US. The results show that our kernel-based classifier achieves an accuracy of 80.6%, and out-performs both the interpolated language model classifier, which achieved a classification accuracy of 70.1%, and the classifier using mutual information-based feature selection (78.8%).

AB - Accurate detection of emotion from speech has clear benefits for the design of more natural human-machine speech interfaces or for the extraction of useful information from large quantities of speech data. The task consists of assigning, out of a fixed set, an emotion category, e.g., anger, fear, or satisfaction, to a speech utterance. In recent work, several classifiers have been proposed for automatic detection of a speaker's emotion using spoken words as the input. These classifiers were designed independently and tested on separate corpora, making it difficult to compare their performance. This paper presents three classifiers, two popular classifiers from the literature modeling the word content via n-gram sequences, one based on an interpolated language model, another on a mutual information-based feature-selection approach, and compares them with a discriminant kernel-based technique that we recently adopted. We have implemented these three classification algorithms and evaluated their performance by applying them to a corpus collected from a spoken-dialog system that was widely deployed across the US. The results show that our kernel-based classifier achieves an accuracy of 80.6%, and out-performs both the interpolated language model classifier, which achieved a classification accuracy of 70.1%, and the classifier using mutual information-based feature selection (78.8%).

UR - http://www.scopus.com/inward/record.url?scp=33646764533&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33646764533&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2005.1415120

DO - 10.1109/ICASSP.2005.1415120

M3 - Conference contribution

AN - SCOPUS:33646764533

SN - 0780388747

SN - 9780780388741

VL - I

BT - 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing

PB - Institute of Electrical and Electronics Engineers Inc.

ER -