Audio Feature Extraction and Analysis for Scene Segmentation and Classification

Zhu Liu, Yao Wang, Tsuhan Chen

Research output: Contribution to journalArticle

Abstract

Understanding of the scene content of a video sequence is very important for content-based indexing and retrieval of multimedia databases. Research in this area in the past several years has focused on the use of speech recognition and image analysis techniques. As a complimentary effort to the prior work, we have focused on using the associated audio information (mainly the nonspeech portion) for video scene analysis. As an example, we consider the problem of discriminating five types of TV programs, namely commercials, basketball games, football games, news reports, and weather forecasts. A set of low-level audio features are proposed for characterizing semantic contents of short audio clips. The linear separability of different classes under the proposed feature space is examined using a clustering analysis. The effective features are identified by evaluating the intracluster and intercluster scattering matrices of the feature space. Using these features, a neural net classifier was successful in separating the above five types of TV programs. By evaluating the changes between the feature vectors of adjacent clips, we also can identify scene breaks in an audio sequence quite accurately. These results demonstrate the capability of the proposed audio features for characterizing the semantic content of an audio sequence.

Original languageEnglish (US)
Pages (from-to)61-79
Number of pages19
JournalJournal of VLSI Signal Processing Systems for Signal, Image, and Video Technology
Volume20
Issue number1-2
StatePublished - 1998

Fingerprint

Feature Extraction
Feature extraction
Segmentation
Semantics
Feature Space
Speech recognition
Image analysis
Classifiers
Scattering
Game
Scene Analysis
Multimedia Databases
Neural networks
Video Analysis
Clustering Analysis
Neural Nets
Scattering Matrix
Separability
Speech Recognition
Feature Vector

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Information Systems
  • Signal Processing

Cite this

Audio Feature Extraction and Analysis for Scene Segmentation and Classification. / Liu, Zhu; Wang, Yao; Chen, Tsuhan.

In: Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology, Vol. 20, No. 1-2, 1998, p. 61-79.

Research output: Contribution to journalArticle

@article{40388900a35846b092d867ab75a7da03,
title = "Audio Feature Extraction and Analysis for Scene Segmentation and Classification",
abstract = "Understanding of the scene content of a video sequence is very important for content-based indexing and retrieval of multimedia databases. Research in this area in the past several years has focused on the use of speech recognition and image analysis techniques. As a complimentary effort to the prior work, we have focused on using the associated audio information (mainly the nonspeech portion) for video scene analysis. As an example, we consider the problem of discriminating five types of TV programs, namely commercials, basketball games, football games, news reports, and weather forecasts. A set of low-level audio features are proposed for characterizing semantic contents of short audio clips. The linear separability of different classes under the proposed feature space is examined using a clustering analysis. The effective features are identified by evaluating the intracluster and intercluster scattering matrices of the feature space. Using these features, a neural net classifier was successful in separating the above five types of TV programs. By evaluating the changes between the feature vectors of adjacent clips, we also can identify scene breaks in an audio sequence quite accurately. These results demonstrate the capability of the proposed audio features for characterizing the semantic content of an audio sequence.",
author = "Zhu Liu and Yao Wang and Tsuhan Chen",
year = "1998",
language = "English (US)",
volume = "20",
pages = "61--79",
journal = "Journal of Signal Processing Systems",
issn = "1939-8018",
publisher = "Springer New York",
number = "1-2",

}

TY - JOUR

T1 - Audio Feature Extraction and Analysis for Scene Segmentation and Classification

AU - Liu, Zhu

AU - Wang, Yao

AU - Chen, Tsuhan

PY - 1998

Y1 - 1998

N2 - Understanding of the scene content of a video sequence is very important for content-based indexing and retrieval of multimedia databases. Research in this area in the past several years has focused on the use of speech recognition and image analysis techniques. As a complimentary effort to the prior work, we have focused on using the associated audio information (mainly the nonspeech portion) for video scene analysis. As an example, we consider the problem of discriminating five types of TV programs, namely commercials, basketball games, football games, news reports, and weather forecasts. A set of low-level audio features are proposed for characterizing semantic contents of short audio clips. The linear separability of different classes under the proposed feature space is examined using a clustering analysis. The effective features are identified by evaluating the intracluster and intercluster scattering matrices of the feature space. Using these features, a neural net classifier was successful in separating the above five types of TV programs. By evaluating the changes between the feature vectors of adjacent clips, we also can identify scene breaks in an audio sequence quite accurately. These results demonstrate the capability of the proposed audio features for characterizing the semantic content of an audio sequence.

AB - Understanding of the scene content of a video sequence is very important for content-based indexing and retrieval of multimedia databases. Research in this area in the past several years has focused on the use of speech recognition and image analysis techniques. As a complimentary effort to the prior work, we have focused on using the associated audio information (mainly the nonspeech portion) for video scene analysis. As an example, we consider the problem of discriminating five types of TV programs, namely commercials, basketball games, football games, news reports, and weather forecasts. A set of low-level audio features are proposed for characterizing semantic contents of short audio clips. The linear separability of different classes under the proposed feature space is examined using a clustering analysis. The effective features are identified by evaluating the intracluster and intercluster scattering matrices of the feature space. Using these features, a neural net classifier was successful in separating the above five types of TV programs. By evaluating the changes between the feature vectors of adjacent clips, we also can identify scene breaks in an audio sequence quite accurately. These results demonstrate the capability of the proposed audio features for characterizing the semantic content of an audio sequence.

UR - http://www.scopus.com/inward/record.url?scp=0032181880&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0032181880&partnerID=8YFLogxK

M3 - Article

VL - 20

SP - 61

EP - 79

JO - Journal of Signal Processing Systems

JF - Journal of Signal Processing Systems

SN - 1939-8018

IS - 1-2

ER -