Learning mid-level features for recognition

Y. Lan Boureau, Francis Bach, Yann LeCun, Jean Ponce

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Many successful models for scene or object recognition transform low-level descriptors (such as Gabor filter re-sponses, or SIFT descriptors) into richer representations of intermediate complexity. This process can often be bro-ken down into two steps: (1) a coding step, which per-forms a pointwise transformation of the descriptors into a representation better adapted to the task, and (2) a pool-ing step, which summarizes the coded features over larger neighborhoods. Several combinations of coding and pool-ing schemes have been proposed in the literature. The goal of this paper is threefold. We seek to establish the rela-tive importance of each step of mid-level feature extrac-tion through a comprehensive cross evaluation of several types of coding modules (hard and soft vector quantization, sparse coding) and pooling schemes (by taking the aver-age, or the maximum), which obtains state-of-the-art per-formance or better on several recognition benchmarks. We show how to improve the best performing coding scheme by learning a supervised discriminative dictionary for sparse coding. We provide theoretical and empirical insight into the remarkable performance of max pooling. By teasing apart components shared by modern mid-level feature ex-tractors, our approach aims to facilitate the design of better recognition architectures.

Original languageEnglish (US)
Title of host publication2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010
Pages2559-2566
Number of pages8
DOIs
StatePublished - 2010
Event2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010 - San Francisco, CA, United States
Duration: Jun 13 2010Jun 18 2010

Other

Other2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010
CountryUnited States
CitySan Francisco, CA
Period6/13/106/18/10

Fingerprint

Gabor filters
Vector quantization
Object recognition
Glossaries

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition

Cite this

Boureau, Y. L., Bach, F., LeCun, Y., & Ponce, J. (2010). Learning mid-level features for recognition. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010 (pp. 2559-2566). [5539963] https://doi.org/10.1109/CVPR.2010.5539963

Learning mid-level features for recognition. / Boureau, Y. Lan; Bach, Francis; LeCun, Yann; Ponce, Jean.

2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010. 2010. p. 2559-2566 5539963.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Boureau, YL, Bach, F, LeCun, Y & Ponce, J 2010, Learning mid-level features for recognition. in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010., 5539963, pp. 2559-2566, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, United States, 6/13/10. https://doi.org/10.1109/CVPR.2010.5539963
Boureau YL, Bach F, LeCun Y, Ponce J. Learning mid-level features for recognition. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010. 2010. p. 2559-2566. 5539963 https://doi.org/10.1109/CVPR.2010.5539963
Boureau, Y. Lan ; Bach, Francis ; LeCun, Yann ; Ponce, Jean. / Learning mid-level features for recognition. 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010. 2010. pp. 2559-2566
@inproceedings{eb597c4f138d4894aab8e01ba1059b0c,
title = "Learning mid-level features for recognition",
abstract = "Many successful models for scene or object recognition transform low-level descriptors (such as Gabor filter re-sponses, or SIFT descriptors) into richer representations of intermediate complexity. This process can often be bro-ken down into two steps: (1) a coding step, which per-forms a pointwise transformation of the descriptors into a representation better adapted to the task, and (2) a pool-ing step, which summarizes the coded features over larger neighborhoods. Several combinations of coding and pool-ing schemes have been proposed in the literature. The goal of this paper is threefold. We seek to establish the rela-tive importance of each step of mid-level feature extrac-tion through a comprehensive cross evaluation of several types of coding modules (hard and soft vector quantization, sparse coding) and pooling schemes (by taking the aver-age, or the maximum), which obtains state-of-the-art per-formance or better on several recognition benchmarks. We show how to improve the best performing coding scheme by learning a supervised discriminative dictionary for sparse coding. We provide theoretical and empirical insight into the remarkable performance of max pooling. By teasing apart components shared by modern mid-level feature ex-tractors, our approach aims to facilitate the design of better recognition architectures.",
author = "Boureau, {Y. Lan} and Francis Bach and Yann LeCun and Jean Ponce",
year = "2010",
doi = "10.1109/CVPR.2010.5539963",
language = "English (US)",
isbn = "9781424469840",
pages = "2559--2566",
booktitle = "2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010",

}

TY - GEN

T1 - Learning mid-level features for recognition

AU - Boureau, Y. Lan

AU - Bach, Francis

AU - LeCun, Yann

AU - Ponce, Jean

PY - 2010

Y1 - 2010

N2 - Many successful models for scene or object recognition transform low-level descriptors (such as Gabor filter re-sponses, or SIFT descriptors) into richer representations of intermediate complexity. This process can often be bro-ken down into two steps: (1) a coding step, which per-forms a pointwise transformation of the descriptors into a representation better adapted to the task, and (2) a pool-ing step, which summarizes the coded features over larger neighborhoods. Several combinations of coding and pool-ing schemes have been proposed in the literature. The goal of this paper is threefold. We seek to establish the rela-tive importance of each step of mid-level feature extrac-tion through a comprehensive cross evaluation of several types of coding modules (hard and soft vector quantization, sparse coding) and pooling schemes (by taking the aver-age, or the maximum), which obtains state-of-the-art per-formance or better on several recognition benchmarks. We show how to improve the best performing coding scheme by learning a supervised discriminative dictionary for sparse coding. We provide theoretical and empirical insight into the remarkable performance of max pooling. By teasing apart components shared by modern mid-level feature ex-tractors, our approach aims to facilitate the design of better recognition architectures.

AB - Many successful models for scene or object recognition transform low-level descriptors (such as Gabor filter re-sponses, or SIFT descriptors) into richer representations of intermediate complexity. This process can often be bro-ken down into two steps: (1) a coding step, which per-forms a pointwise transformation of the descriptors into a representation better adapted to the task, and (2) a pool-ing step, which summarizes the coded features over larger neighborhoods. Several combinations of coding and pool-ing schemes have been proposed in the literature. The goal of this paper is threefold. We seek to establish the rela-tive importance of each step of mid-level feature extrac-tion through a comprehensive cross evaluation of several types of coding modules (hard and soft vector quantization, sparse coding) and pooling schemes (by taking the aver-age, or the maximum), which obtains state-of-the-art per-formance or better on several recognition benchmarks. We show how to improve the best performing coding scheme by learning a supervised discriminative dictionary for sparse coding. We provide theoretical and empirical insight into the remarkable performance of max pooling. By teasing apart components shared by modern mid-level feature ex-tractors, our approach aims to facilitate the design of better recognition architectures.

UR - http://www.scopus.com/inward/record.url?scp=77955993281&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77955993281&partnerID=8YFLogxK

U2 - 10.1109/CVPR.2010.5539963

DO - 10.1109/CVPR.2010.5539963

M3 - Conference contribution

SN - 9781424469840

SP - 2559

EP - 2566

BT - 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010

ER -