Voice signatures

Izhak Shafran, Michael Riley, Mehryar Mohri

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Most current spoken-dialog systems only extract sequences of words from a speaker's voice. This largely ignores other useful information that can be inferred from speech such as gender, age, dialect, or emotion. These characteristics of a speaker's voice, voice signatures, whether static or dynamic, can be useful for speech mining applications or for the design of a natural spoken-dialog system. This paper explores the problem of extracting automatically and accurately voice signatures from a speaker's voice. We investigate two approaches for extracting speaker traits: the first focuses on general acoustic and prosodic features, the second on the choice of words used by the speaker. In the first approach, we show that standard speech/non-speech HMMs, conditioned on speaker traits and evaluated on cepstral and pitch features, achieve accuracies well above chance for all examined traits. The second approach, using support vector machines with rational kernels applied to speech recognition lattices, attains an accuracy of about 81% in the task of binary classification of emotion. Our results are based on a corpus of speech data collected from a deployed customer-care application (HMIHY 0300). While still preliminary, our results are significant and show that voice signatures are of practical interest in real-world applications.

Original languageEnglish (US)
Title of host publication2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages31-36
Number of pages6
ISBN (Print)0780379802, 9780780379800
DOIs
StatePublished - 2003
EventIEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003 - St. Thomas, United States
Duration: Nov 30 2003Dec 4 2003

Other

OtherIEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003
CountryUnited States
CitySt. Thomas
Period11/30/0312/4/03

Fingerprint

Speech recognition
Support vector machines
Acoustics

ASJC Scopus subject areas

  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Cite this

Shafran, I., Riley, M., & Mohri, M. (2003). Voice signatures. In 2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003 (pp. 31-36). [1318399] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ASRU.2003.1318399

Voice signatures. / Shafran, Izhak; Riley, Michael; Mohri, Mehryar.

2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003. Institute of Electrical and Electronics Engineers Inc., 2003. p. 31-36 1318399.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Shafran, I, Riley, M & Mohri, M 2003, Voice signatures. in 2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003., 1318399, Institute of Electrical and Electronics Engineers Inc., pp. 31-36, IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003, St. Thomas, United States, 11/30/03. https://doi.org/10.1109/ASRU.2003.1318399
Shafran I, Riley M, Mohri M. Voice signatures. In 2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003. Institute of Electrical and Electronics Engineers Inc. 2003. p. 31-36. 1318399 https://doi.org/10.1109/ASRU.2003.1318399
Shafran, Izhak ; Riley, Michael ; Mohri, Mehryar. / Voice signatures. 2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003. Institute of Electrical and Electronics Engineers Inc., 2003. pp. 31-36
@inproceedings{afcac81dd5fb4a65a73daa276b7f80b9,
title = "Voice signatures",
abstract = "Most current spoken-dialog systems only extract sequences of words from a speaker's voice. This largely ignores other useful information that can be inferred from speech such as gender, age, dialect, or emotion. These characteristics of a speaker's voice, voice signatures, whether static or dynamic, can be useful for speech mining applications or for the design of a natural spoken-dialog system. This paper explores the problem of extracting automatically and accurately voice signatures from a speaker's voice. We investigate two approaches for extracting speaker traits: the first focuses on general acoustic and prosodic features, the second on the choice of words used by the speaker. In the first approach, we show that standard speech/non-speech HMMs, conditioned on speaker traits and evaluated on cepstral and pitch features, achieve accuracies well above chance for all examined traits. The second approach, using support vector machines with rational kernels applied to speech recognition lattices, attains an accuracy of about 81{\%} in the task of binary classification of emotion. Our results are based on a corpus of speech data collected from a deployed customer-care application (HMIHY 0300). While still preliminary, our results are significant and show that voice signatures are of practical interest in real-world applications.",
author = "Izhak Shafran and Michael Riley and Mehryar Mohri",
year = "2003",
doi = "10.1109/ASRU.2003.1318399",
language = "English (US)",
isbn = "0780379802",
pages = "31--36",
booktitle = "2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Voice signatures

AU - Shafran, Izhak

AU - Riley, Michael

AU - Mohri, Mehryar

PY - 2003

Y1 - 2003

N2 - Most current spoken-dialog systems only extract sequences of words from a speaker's voice. This largely ignores other useful information that can be inferred from speech such as gender, age, dialect, or emotion. These characteristics of a speaker's voice, voice signatures, whether static or dynamic, can be useful for speech mining applications or for the design of a natural spoken-dialog system. This paper explores the problem of extracting automatically and accurately voice signatures from a speaker's voice. We investigate two approaches for extracting speaker traits: the first focuses on general acoustic and prosodic features, the second on the choice of words used by the speaker. In the first approach, we show that standard speech/non-speech HMMs, conditioned on speaker traits and evaluated on cepstral and pitch features, achieve accuracies well above chance for all examined traits. The second approach, using support vector machines with rational kernels applied to speech recognition lattices, attains an accuracy of about 81% in the task of binary classification of emotion. Our results are based on a corpus of speech data collected from a deployed customer-care application (HMIHY 0300). While still preliminary, our results are significant and show that voice signatures are of practical interest in real-world applications.

AB - Most current spoken-dialog systems only extract sequences of words from a speaker's voice. This largely ignores other useful information that can be inferred from speech such as gender, age, dialect, or emotion. These characteristics of a speaker's voice, voice signatures, whether static or dynamic, can be useful for speech mining applications or for the design of a natural spoken-dialog system. This paper explores the problem of extracting automatically and accurately voice signatures from a speaker's voice. We investigate two approaches for extracting speaker traits: the first focuses on general acoustic and prosodic features, the second on the choice of words used by the speaker. In the first approach, we show that standard speech/non-speech HMMs, conditioned on speaker traits and evaluated on cepstral and pitch features, achieve accuracies well above chance for all examined traits. The second approach, using support vector machines with rational kernels applied to speech recognition lattices, attains an accuracy of about 81% in the task of binary classification of emotion. Our results are based on a corpus of speech data collected from a deployed customer-care application (HMIHY 0300). While still preliminary, our results are significant and show that voice signatures are of practical interest in real-world applications.

UR - http://www.scopus.com/inward/record.url?scp=84946723550&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84946723550&partnerID=8YFLogxK

U2 - 10.1109/ASRU.2003.1318399

DO - 10.1109/ASRU.2003.1318399

M3 - Conference contribution

SN - 0780379802

SN - 9780780379800

SP - 31

EP - 36

BT - 2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003

PB - Institute of Electrical and Electronics Engineers Inc.

ER -