Convolutional learning of spatio-temporal features

Graham W. Taylor, Robert Fergus, Yann LeCun, Christoph Bregler

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We address the problem of learning good features for understanding video data. We introduce a model that learns latent representations of image sequences from pairs of successive images. The convolutional architecture of our model allows it to scale to realistic image sizes whilst using a compact parametrization. In experiments on the NORB dataset, we show our model extracts latent "flow fields" which correspond to the transformation between the pair of input frames. We also use our model to extract low-level motion features in a multi-stage architecture for action recognition, demonstrating competitive performance on both the KTH and Hollywood2 datasets.

Original languageEnglish (US)
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages140-153
Number of pages14
Volume6316 LNCS
EditionPART 6
DOIs
StatePublished - 2010
Event11th European Conference on Computer Vision, ECCV 2010 - Heraklion, Crete, Greece
Duration: Sep 5 2010Sep 11 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 6
Volume6316 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other11th European Conference on Computer Vision, ECCV 2010
CountryGreece
CityHeraklion, Crete
Period9/5/109/11/10

Fingerprint

Action Recognition
Image Sequence
Parametrization
Model
Flow Field
Flow fields
Motion
Learning
Experiment
Experiments
Architecture

Keywords

  • activity recognition
  • convolutional nets
  • optical flow
  • restricted Boltzmann machines
  • unsupervised learning
  • video analysis

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Taylor, G. W., Fergus, R., LeCun, Y., & Bregler, C. (2010). Convolutional learning of spatio-temporal features. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (PART 6 ed., Vol. 6316 LNCS, pp. 140-153). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6316 LNCS, No. PART 6). https://doi.org/10.1007/978-3-642-15567-3_11

Convolutional learning of spatio-temporal features. / Taylor, Graham W.; Fergus, Robert; LeCun, Yann; Bregler, Christoph.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 6316 LNCS PART 6. ed. 2010. p. 140-153 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6316 LNCS, No. PART 6).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Taylor, GW, Fergus, R, LeCun, Y & Bregler, C 2010, Convolutional learning of spatio-temporal features. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). PART 6 edn, vol. 6316 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), no. PART 6, vol. 6316 LNCS, pp. 140-153, 11th European Conference on Computer Vision, ECCV 2010, Heraklion, Crete, Greece, 9/5/10. https://doi.org/10.1007/978-3-642-15567-3_11
Taylor GW, Fergus R, LeCun Y, Bregler C. Convolutional learning of spatio-temporal features. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). PART 6 ed. Vol. 6316 LNCS. 2010. p. 140-153. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 6). https://doi.org/10.1007/978-3-642-15567-3_11
Taylor, Graham W. ; Fergus, Robert ; LeCun, Yann ; Bregler, Christoph. / Convolutional learning of spatio-temporal features. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 6316 LNCS PART 6. ed. 2010. pp. 140-153 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 6).
@inproceedings{11ebf57b44ad4f0083666da078246377,
title = "Convolutional learning of spatio-temporal features",
abstract = "We address the problem of learning good features for understanding video data. We introduce a model that learns latent representations of image sequences from pairs of successive images. The convolutional architecture of our model allows it to scale to realistic image sizes whilst using a compact parametrization. In experiments on the NORB dataset, we show our model extracts latent {"}flow fields{"} which correspond to the transformation between the pair of input frames. We also use our model to extract low-level motion features in a multi-stage architecture for action recognition, demonstrating competitive performance on both the KTH and Hollywood2 datasets.",
keywords = "activity recognition, convolutional nets, optical flow, restricted Boltzmann machines, unsupervised learning, video analysis",
author = "Taylor, {Graham W.} and Robert Fergus and Yann LeCun and Christoph Bregler",
year = "2010",
doi = "10.1007/978-3-642-15567-3_11",
language = "English (US)",
isbn = "3642155669",
volume = "6316 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
number = "PART 6",
pages = "140--153",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
edition = "PART 6",

}

TY - GEN

T1 - Convolutional learning of spatio-temporal features

AU - Taylor, Graham W.

AU - Fergus, Robert

AU - LeCun, Yann

AU - Bregler, Christoph

PY - 2010

Y1 - 2010

N2 - We address the problem of learning good features for understanding video data. We introduce a model that learns latent representations of image sequences from pairs of successive images. The convolutional architecture of our model allows it to scale to realistic image sizes whilst using a compact parametrization. In experiments on the NORB dataset, we show our model extracts latent "flow fields" which correspond to the transformation between the pair of input frames. We also use our model to extract low-level motion features in a multi-stage architecture for action recognition, demonstrating competitive performance on both the KTH and Hollywood2 datasets.

AB - We address the problem of learning good features for understanding video data. We introduce a model that learns latent representations of image sequences from pairs of successive images. The convolutional architecture of our model allows it to scale to realistic image sizes whilst using a compact parametrization. In experiments on the NORB dataset, we show our model extracts latent "flow fields" which correspond to the transformation between the pair of input frames. We also use our model to extract low-level motion features in a multi-stage architecture for action recognition, demonstrating competitive performance on both the KTH and Hollywood2 datasets.

KW - activity recognition

KW - convolutional nets

KW - optical flow

KW - restricted Boltzmann machines

KW - unsupervised learning

KW - video analysis

UR - http://www.scopus.com/inward/record.url?scp=78149336740&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78149336740&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-15567-3_11

DO - 10.1007/978-3-642-15567-3_11

M3 - Conference contribution

SN - 3642155669

SN - 9783642155666

VL - 6316 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 140

EP - 153

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -