MoDeep: A deep learning framework using motion features for human pose estimation

Arjun Jain, Jonathan Tompson, Yann LeCun, Christoph Bregler

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this work, we propose a novel and efficient method for articulated human pose estimation in videos using a convolutional network architecture, which incorporates both color and motion features. We propose a new human body pose dataset, FLIC-motion (This dataset can be downloaded from http://cs.nyu.edu/∼ajain/accv2014/.), that extends the FLIC dataset [1] with additional motion features. We apply our architecture to this dataset and report significantly better performance than current state-of-the-art pose detection systems.

Original languageEnglish (US)
Title of host publicationComputer Vision - ACCV 2014 - 12th Asian Conference on Computer Vision, Revised Selected Papers
PublisherSpringer Verlag
Pages302-315
Number of pages14
Volume9004
ISBN (Print)9783319168074
DOIs
StatePublished - 2015
Event12th Asian Conference on Computer Vision, ACCV 2014 - Singapore, Singapore
Duration: Nov 1 2014Nov 5 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9004
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other12th Asian Conference on Computer Vision, ACCV 2014
CountrySingapore
CitySingapore
Period11/1/1411/5/14

Fingerprint

Pose Estimation
Network architecture
Color
Motion
Network Architecture
Learning
Human
Framework
Deep learning

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Jain, A., Tompson, J., LeCun, Y., & Bregler, C. (2015). MoDeep: A deep learning framework using motion features for human pose estimation. In Computer Vision - ACCV 2014 - 12th Asian Conference on Computer Vision, Revised Selected Papers (Vol. 9004, pp. 302-315). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9004). Springer Verlag. https://doi.org/10.1007/978-3-319-16808-1_21

MoDeep : A deep learning framework using motion features for human pose estimation. / Jain, Arjun; Tompson, Jonathan; LeCun, Yann; Bregler, Christoph.

Computer Vision - ACCV 2014 - 12th Asian Conference on Computer Vision, Revised Selected Papers. Vol. 9004 Springer Verlag, 2015. p. 302-315 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9004).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Jain, A, Tompson, J, LeCun, Y & Bregler, C 2015, MoDeep: A deep learning framework using motion features for human pose estimation. in Computer Vision - ACCV 2014 - 12th Asian Conference on Computer Vision, Revised Selected Papers. vol. 9004, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9004, Springer Verlag, pp. 302-315, 12th Asian Conference on Computer Vision, ACCV 2014, Singapore, Singapore, 11/1/14. https://doi.org/10.1007/978-3-319-16808-1_21
Jain A, Tompson J, LeCun Y, Bregler C. MoDeep: A deep learning framework using motion features for human pose estimation. In Computer Vision - ACCV 2014 - 12th Asian Conference on Computer Vision, Revised Selected Papers. Vol. 9004. Springer Verlag. 2015. p. 302-315. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-16808-1_21
Jain, Arjun ; Tompson, Jonathan ; LeCun, Yann ; Bregler, Christoph. / MoDeep : A deep learning framework using motion features for human pose estimation. Computer Vision - ACCV 2014 - 12th Asian Conference on Computer Vision, Revised Selected Papers. Vol. 9004 Springer Verlag, 2015. pp. 302-315 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{31ecd16adbd84fc9bdb7577bc260bfcc,
title = "MoDeep: A deep learning framework using motion features for human pose estimation",
abstract = "In this work, we propose a novel and efficient method for articulated human pose estimation in videos using a convolutional network architecture, which incorporates both color and motion features. We propose a new human body pose dataset, FLIC-motion (This dataset can be downloaded from http://cs.nyu.edu/∼ajain/accv2014/.), that extends the FLIC dataset [1] with additional motion features. We apply our architecture to this dataset and report significantly better performance than current state-of-the-art pose detection systems.",
author = "Arjun Jain and Jonathan Tompson and Yann LeCun and Christoph Bregler",
year = "2015",
doi = "10.1007/978-3-319-16808-1_21",
language = "English (US)",
isbn = "9783319168074",
volume = "9004",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "302--315",
booktitle = "Computer Vision - ACCV 2014 - 12th Asian Conference on Computer Vision, Revised Selected Papers",

}

TY - GEN

T1 - MoDeep

T2 - A deep learning framework using motion features for human pose estimation

AU - Jain, Arjun

AU - Tompson, Jonathan

AU - LeCun, Yann

AU - Bregler, Christoph

PY - 2015

Y1 - 2015

N2 - In this work, we propose a novel and efficient method for articulated human pose estimation in videos using a convolutional network architecture, which incorporates both color and motion features. We propose a new human body pose dataset, FLIC-motion (This dataset can be downloaded from http://cs.nyu.edu/∼ajain/accv2014/.), that extends the FLIC dataset [1] with additional motion features. We apply our architecture to this dataset and report significantly better performance than current state-of-the-art pose detection systems.

AB - In this work, we propose a novel and efficient method for articulated human pose estimation in videos using a convolutional network architecture, which incorporates both color and motion features. We propose a new human body pose dataset, FLIC-motion (This dataset can be downloaded from http://cs.nyu.edu/∼ajain/accv2014/.), that extends the FLIC dataset [1] with additional motion features. We apply our architecture to this dataset and report significantly better performance than current state-of-the-art pose detection systems.

UR - http://www.scopus.com/inward/record.url?scp=84945972584&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84945972584&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-16808-1_21

DO - 10.1007/978-3-319-16808-1_21

M3 - Conference contribution

AN - SCOPUS:84945972584

SN - 9783319168074

VL - 9004

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 302

EP - 315

BT - Computer Vision - ACCV 2014 - 12th Asian Conference on Computer Vision, Revised Selected Papers

PB - Springer Verlag

ER -