Ask the locals: Multi-way local pooling for image recognition

Y. Lan Boureau, Nicolas Le Roux, Francis Bach, Jean Ponce, Yann LeCun

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Invariant representations in object recognition systems are generally obtained by pooling feature vectors over spatially local neighborhoods. But pooling is not local in the feature vector space, so that widely dissimilar features may be pooled together if they are in nearby locations. Recent approaches rely on sophisticated encoding methods and more specialized codebooks (or dictionaries), e.g., learned on subsets of descriptors which are close in feature space, to circumvent this problem. In this work, we argue that a common trait found in much recent work in image recognition or retrieval is that it leverages locality in feature space on top of purely spatial locality. We propose to apply this idea in its simplest form to an object recognition system based on the spatial pyramid framework, to increase the performance of small dictionaries with very little added engineering. State-of-the-art results on several object recognition benchmarks show the promise of this approach.

Original languageEnglish (US)
Title of host publication2011 International Conference on Computer Vision, ICCV 2011
Pages2651-2658
Number of pages8
DOIs
StatePublished - 2011
Event2011 IEEE International Conference on Computer Vision, ICCV 2011 - Barcelona, Spain
Duration: Nov 6 2011Nov 13 2011

Other

Other2011 IEEE International Conference on Computer Vision, ICCV 2011
CountrySpain
CityBarcelona
Period11/6/1111/13/11

Fingerprint

Image recognition
Object recognition
Glossaries
Image retrieval
Vector spaces

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition

Cite this

Boureau, Y. L., Le Roux, N., Bach, F., Ponce, J., & LeCun, Y. (2011). Ask the locals: Multi-way local pooling for image recognition. In 2011 International Conference on Computer Vision, ICCV 2011 (pp. 2651-2658). [6126555] https://doi.org/10.1109/ICCV.2011.6126555

Ask the locals : Multi-way local pooling for image recognition. / Boureau, Y. Lan; Le Roux, Nicolas; Bach, Francis; Ponce, Jean; LeCun, Yann.

2011 International Conference on Computer Vision, ICCV 2011. 2011. p. 2651-2658 6126555.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Boureau, YL, Le Roux, N, Bach, F, Ponce, J & LeCun, Y 2011, Ask the locals: Multi-way local pooling for image recognition. in 2011 International Conference on Computer Vision, ICCV 2011., 6126555, pp. 2651-2658, 2011 IEEE International Conference on Computer Vision, ICCV 2011, Barcelona, Spain, 11/6/11. https://doi.org/10.1109/ICCV.2011.6126555
Boureau YL, Le Roux N, Bach F, Ponce J, LeCun Y. Ask the locals: Multi-way local pooling for image recognition. In 2011 International Conference on Computer Vision, ICCV 2011. 2011. p. 2651-2658. 6126555 https://doi.org/10.1109/ICCV.2011.6126555
Boureau, Y. Lan ; Le Roux, Nicolas ; Bach, Francis ; Ponce, Jean ; LeCun, Yann. / Ask the locals : Multi-way local pooling for image recognition. 2011 International Conference on Computer Vision, ICCV 2011. 2011. pp. 2651-2658
@inproceedings{af3545f2dee8449ba3b15583095dcd34,
title = "Ask the locals: Multi-way local pooling for image recognition",
abstract = "Invariant representations in object recognition systems are generally obtained by pooling feature vectors over spatially local neighborhoods. But pooling is not local in the feature vector space, so that widely dissimilar features may be pooled together if they are in nearby locations. Recent approaches rely on sophisticated encoding methods and more specialized codebooks (or dictionaries), e.g., learned on subsets of descriptors which are close in feature space, to circumvent this problem. In this work, we argue that a common trait found in much recent work in image recognition or retrieval is that it leverages locality in feature space on top of purely spatial locality. We propose to apply this idea in its simplest form to an object recognition system based on the spatial pyramid framework, to increase the performance of small dictionaries with very little added engineering. State-of-the-art results on several object recognition benchmarks show the promise of this approach.",
author = "Boureau, {Y. Lan} and {Le Roux}, Nicolas and Francis Bach and Jean Ponce and Yann LeCun",
year = "2011",
doi = "10.1109/ICCV.2011.6126555",
language = "English (US)",
isbn = "9781457711015",
pages = "2651--2658",
booktitle = "2011 International Conference on Computer Vision, ICCV 2011",

}

TY - GEN

T1 - Ask the locals

T2 - Multi-way local pooling for image recognition

AU - Boureau, Y. Lan

AU - Le Roux, Nicolas

AU - Bach, Francis

AU - Ponce, Jean

AU - LeCun, Yann

PY - 2011

Y1 - 2011

N2 - Invariant representations in object recognition systems are generally obtained by pooling feature vectors over spatially local neighborhoods. But pooling is not local in the feature vector space, so that widely dissimilar features may be pooled together if they are in nearby locations. Recent approaches rely on sophisticated encoding methods and more specialized codebooks (or dictionaries), e.g., learned on subsets of descriptors which are close in feature space, to circumvent this problem. In this work, we argue that a common trait found in much recent work in image recognition or retrieval is that it leverages locality in feature space on top of purely spatial locality. We propose to apply this idea in its simplest form to an object recognition system based on the spatial pyramid framework, to increase the performance of small dictionaries with very little added engineering. State-of-the-art results on several object recognition benchmarks show the promise of this approach.

AB - Invariant representations in object recognition systems are generally obtained by pooling feature vectors over spatially local neighborhoods. But pooling is not local in the feature vector space, so that widely dissimilar features may be pooled together if they are in nearby locations. Recent approaches rely on sophisticated encoding methods and more specialized codebooks (or dictionaries), e.g., learned on subsets of descriptors which are close in feature space, to circumvent this problem. In this work, we argue that a common trait found in much recent work in image recognition or retrieval is that it leverages locality in feature space on top of purely spatial locality. We propose to apply this idea in its simplest form to an object recognition system based on the spatial pyramid framework, to increase the performance of small dictionaries with very little added engineering. State-of-the-art results on several object recognition benchmarks show the promise of this approach.

UR - http://www.scopus.com/inward/record.url?scp=84856649187&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84856649187&partnerID=8YFLogxK

U2 - 10.1109/ICCV.2011.6126555

DO - 10.1109/ICCV.2011.6126555

M3 - Conference contribution

AN - SCOPUS:84856649187

SN - 9781457711015

SP - 2651

EP - 2658

BT - 2011 International Conference on Computer Vision, ICCV 2011

ER -