From handcrafted to learned representations for human action recognition: A survey

Fan Zhu, Ling Shao, Jin Xie, Yi Fang

Research output: Contribution to journalArticle

Abstract

Human action recognition is an important branch among the studies of both human perception and computer vision systems. Along with the development of artificial intelligence, deep learning techniques have gained remarkable reputation when dealing with image categorization tasks (e.g., object detection and classification). However, since human actions normally present in the form of sequential image frames, analyzing human action data requires significantly increased computational power than still images when deep learning techniques are employed. Such a challenge has been the bottleneck for the migration of learning-based image representation techniques to action sequences, so that the old fashioned handcrafted human action representations are still widely used for human action recognition tasks. On the other hand, since handcrafted representations are usually ad-hoc and overfit to specific data, they are incapable of being generalized to deal with various realistic scenarios. Consequently, resorting to deep learning action representations for human action recognition tasks is eventually a natural option. In this work, we provide a detailed overview of recent advancements in human action representations. As the first survey that covers both handcrafted and learning-based action representations, we explicitly discuss the superiorities and limitations of exiting techniques from both kinds. The ultimate goal of this survey is to provide comprehensive analysis and comparisons between learning-based and handcrafted action representations respectively, so as to inspire action recognition researchers towards the study of both kinds of representation techniques.

Original languageEnglish (US)
Pages (from-to)42-52
Number of pages11
JournalImage and Vision Computing
Volume55
DOIs
StatePublished - Nov 1 2016

Fingerprint

Computer vision
Artificial intelligence
Deep learning
Object detection

Keywords

  • Convolutional neural network
  • Deep learning
  • Dictionary learning
  • Handcrafted features
  • Human action recognition

ASJC Scopus subject areas

  • Signal Processing
  • Computer Vision and Pattern Recognition

Cite this

From handcrafted to learned representations for human action recognition : A survey. / Zhu, Fan; Shao, Ling; Xie, Jin; Fang, Yi.

In: Image and Vision Computing, Vol. 55, 01.11.2016, p. 42-52.

Research output: Contribution to journalArticle

@article{de44567e83c444659eb5a1f7a800d69b,
title = "From handcrafted to learned representations for human action recognition: A survey",
abstract = "Human action recognition is an important branch among the studies of both human perception and computer vision systems. Along with the development of artificial intelligence, deep learning techniques have gained remarkable reputation when dealing with image categorization tasks (e.g., object detection and classification). However, since human actions normally present in the form of sequential image frames, analyzing human action data requires significantly increased computational power than still images when deep learning techniques are employed. Such a challenge has been the bottleneck for the migration of learning-based image representation techniques to action sequences, so that the old fashioned handcrafted human action representations are still widely used for human action recognition tasks. On the other hand, since handcrafted representations are usually ad-hoc and overfit to specific data, they are incapable of being generalized to deal with various realistic scenarios. Consequently, resorting to deep learning action representations for human action recognition tasks is eventually a natural option. In this work, we provide a detailed overview of recent advancements in human action representations. As the first survey that covers both handcrafted and learning-based action representations, we explicitly discuss the superiorities and limitations of exiting techniques from both kinds. The ultimate goal of this survey is to provide comprehensive analysis and comparisons between learning-based and handcrafted action representations respectively, so as to inspire action recognition researchers towards the study of both kinds of representation techniques.",
keywords = "Convolutional neural network, Deep learning, Dictionary learning, Handcrafted features, Human action recognition",
author = "Fan Zhu and Ling Shao and Jin Xie and Yi Fang",
year = "2016",
month = "11",
day = "1",
doi = "10.1016/j.imavis.2016.06.007",
language = "English (US)",
volume = "55",
pages = "42--52",
journal = "Image and Vision Computing",
issn = "0262-8856",
publisher = "Elsevier Limited",

}

TY - JOUR

T1 - From handcrafted to learned representations for human action recognition

T2 - A survey

AU - Zhu, Fan

AU - Shao, Ling

AU - Xie, Jin

AU - Fang, Yi

PY - 2016/11/1

Y1 - 2016/11/1

N2 - Human action recognition is an important branch among the studies of both human perception and computer vision systems. Along with the development of artificial intelligence, deep learning techniques have gained remarkable reputation when dealing with image categorization tasks (e.g., object detection and classification). However, since human actions normally present in the form of sequential image frames, analyzing human action data requires significantly increased computational power than still images when deep learning techniques are employed. Such a challenge has been the bottleneck for the migration of learning-based image representation techniques to action sequences, so that the old fashioned handcrafted human action representations are still widely used for human action recognition tasks. On the other hand, since handcrafted representations are usually ad-hoc and overfit to specific data, they are incapable of being generalized to deal with various realistic scenarios. Consequently, resorting to deep learning action representations for human action recognition tasks is eventually a natural option. In this work, we provide a detailed overview of recent advancements in human action representations. As the first survey that covers both handcrafted and learning-based action representations, we explicitly discuss the superiorities and limitations of exiting techniques from both kinds. The ultimate goal of this survey is to provide comprehensive analysis and comparisons between learning-based and handcrafted action representations respectively, so as to inspire action recognition researchers towards the study of both kinds of representation techniques.

AB - Human action recognition is an important branch among the studies of both human perception and computer vision systems. Along with the development of artificial intelligence, deep learning techniques have gained remarkable reputation when dealing with image categorization tasks (e.g., object detection and classification). However, since human actions normally present in the form of sequential image frames, analyzing human action data requires significantly increased computational power than still images when deep learning techniques are employed. Such a challenge has been the bottleneck for the migration of learning-based image representation techniques to action sequences, so that the old fashioned handcrafted human action representations are still widely used for human action recognition tasks. On the other hand, since handcrafted representations are usually ad-hoc and overfit to specific data, they are incapable of being generalized to deal with various realistic scenarios. Consequently, resorting to deep learning action representations for human action recognition tasks is eventually a natural option. In this work, we provide a detailed overview of recent advancements in human action representations. As the first survey that covers both handcrafted and learning-based action representations, we explicitly discuss the superiorities and limitations of exiting techniques from both kinds. The ultimate goal of this survey is to provide comprehensive analysis and comparisons between learning-based and handcrafted action representations respectively, so as to inspire action recognition researchers towards the study of both kinds of representation techniques.

KW - Convolutional neural network

KW - Deep learning

KW - Dictionary learning

KW - Handcrafted features

KW - Human action recognition

UR - http://www.scopus.com/inward/record.url?scp=84979582625&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84979582625&partnerID=8YFLogxK

U2 - 10.1016/j.imavis.2016.06.007

DO - 10.1016/j.imavis.2016.06.007

M3 - Article

AN - SCOPUS:84979582625

VL - 55

SP - 42

EP - 52

JO - Image and Vision Computing

JF - Image and Vision Computing

SN - 0262-8856

ER -