A theoretical analysis of feature pooling in visual recognition

Y. Lan Boureau, Jean Ponce, Yann LeCun

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Many modem visual recognition algorithms incorporate a step of spatial 'pooling', where the outputs of several nearby feature detectors are combined into a local or global 'bag of features', in a way that preserves task-related information while removing irrelevant details. Pooling is used to achieve invariance to image transformations, more compact representations, and better robustness to noise and clutter. Several papers have shown that the details of the pooling operation can greatly influence the performance, but studies have so far been purely empirical. In this paper, we show that the reasons underlying the performance of various pooling methods are obscured by several confounding factors, such as the link between the sample cardinality in a spatial pool and the resolution at which low-level features have been extracted. We provide a detailed theoretical analysis of max pooling and average pooling, and give extensive empirical comparisons for object recognition tasks.

Original languageEnglish (US)
Title of host publicationICML 2010 - Proceedings, 27th International Conference on Machine Learning
Pages111-118
Number of pages8
StatePublished - 2010
Event27th International Conference on Machine Learning, ICML 2010 - Haifa, Israel
Duration: Jun 21 2010Jun 25 2010

Other

Other27th International Conference on Machine Learning, ICML 2010
CountryIsrael
CityHaifa
Period6/21/106/25/10

Fingerprint

Object recognition
Modems
Invariance
Detectors
performance

ASJC Scopus subject areas

  • Artificial Intelligence
  • Education

Cite this

Boureau, Y. L., Ponce, J., & LeCun, Y. (2010). A theoretical analysis of feature pooling in visual recognition. In ICML 2010 - Proceedings, 27th International Conference on Machine Learning (pp. 111-118)

A theoretical analysis of feature pooling in visual recognition. / Boureau, Y. Lan; Ponce, Jean; LeCun, Yann.

ICML 2010 - Proceedings, 27th International Conference on Machine Learning. 2010. p. 111-118.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Boureau, YL, Ponce, J & LeCun, Y 2010, A theoretical analysis of feature pooling in visual recognition. in ICML 2010 - Proceedings, 27th International Conference on Machine Learning. pp. 111-118, 27th International Conference on Machine Learning, ICML 2010, Haifa, Israel, 6/21/10.
Boureau YL, Ponce J, LeCun Y. A theoretical analysis of feature pooling in visual recognition. In ICML 2010 - Proceedings, 27th International Conference on Machine Learning. 2010. p. 111-118
Boureau, Y. Lan ; Ponce, Jean ; LeCun, Yann. / A theoretical analysis of feature pooling in visual recognition. ICML 2010 - Proceedings, 27th International Conference on Machine Learning. 2010. pp. 111-118
@inproceedings{430e3dd751bf41ceaa93e6e970fe10fa,
title = "A theoretical analysis of feature pooling in visual recognition",
abstract = "Many modem visual recognition algorithms incorporate a step of spatial 'pooling', where the outputs of several nearby feature detectors are combined into a local or global 'bag of features', in a way that preserves task-related information while removing irrelevant details. Pooling is used to achieve invariance to image transformations, more compact representations, and better robustness to noise and clutter. Several papers have shown that the details of the pooling operation can greatly influence the performance, but studies have so far been purely empirical. In this paper, we show that the reasons underlying the performance of various pooling methods are obscured by several confounding factors, such as the link between the sample cardinality in a spatial pool and the resolution at which low-level features have been extracted. We provide a detailed theoretical analysis of max pooling and average pooling, and give extensive empirical comparisons for object recognition tasks.",
author = "Boureau, {Y. Lan} and Jean Ponce and Yann LeCun",
year = "2010",
language = "English (US)",
isbn = "9781605589077",
pages = "111--118",
booktitle = "ICML 2010 - Proceedings, 27th International Conference on Machine Learning",

}

TY - GEN

T1 - A theoretical analysis of feature pooling in visual recognition

AU - Boureau, Y. Lan

AU - Ponce, Jean

AU - LeCun, Yann

PY - 2010

Y1 - 2010

N2 - Many modem visual recognition algorithms incorporate a step of spatial 'pooling', where the outputs of several nearby feature detectors are combined into a local or global 'bag of features', in a way that preserves task-related information while removing irrelevant details. Pooling is used to achieve invariance to image transformations, more compact representations, and better robustness to noise and clutter. Several papers have shown that the details of the pooling operation can greatly influence the performance, but studies have so far been purely empirical. In this paper, we show that the reasons underlying the performance of various pooling methods are obscured by several confounding factors, such as the link between the sample cardinality in a spatial pool and the resolution at which low-level features have been extracted. We provide a detailed theoretical analysis of max pooling and average pooling, and give extensive empirical comparisons for object recognition tasks.

AB - Many modem visual recognition algorithms incorporate a step of spatial 'pooling', where the outputs of several nearby feature detectors are combined into a local or global 'bag of features', in a way that preserves task-related information while removing irrelevant details. Pooling is used to achieve invariance to image transformations, more compact representations, and better robustness to noise and clutter. Several papers have shown that the details of the pooling operation can greatly influence the performance, but studies have so far been purely empirical. In this paper, we show that the reasons underlying the performance of various pooling methods are obscured by several confounding factors, such as the link between the sample cardinality in a spatial pool and the resolution at which low-level features have been extracted. We provide a detailed theoretical analysis of max pooling and average pooling, and give extensive empirical comparisons for object recognition tasks.

UR - http://www.scopus.com/inward/record.url?scp=77956502203&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77956502203&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9781605589077

SP - 111

EP - 118

BT - ICML 2010 - Proceedings, 27th International Conference on Machine Learning

ER -