Learning methods for generic object recognition with invariance to pose and lighting

Yann LeCun, Fu Jie Huang, Léon Bottou

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We assess the applicability of several popular learning methods for the problem of recognizing generic visual categories with invariance to pose, lighting, and surrounding clutter. A large dataset comprising stereo image pairs of 50 uniform-colored toys under 36 azimuths, 9 elevations, and 6 lighting conditions was collected (for a total of 194,400 individual images). The objects were 10 instances of 5 generic categories: four-legged animals, human figures, airplanes, trucks, and cars. Five instances of each category were used for training, and the other five for testing. Low-resolution grayscale images of the objects with various amounts of variability and surrounding clutter were used for training and testing. Nearest Neighbor methods, Support Vector Machines, and Convolutional Networks, operating on raw pixels or on PCA-derived features were tested. Test error rates for unseen object instances placed on uniform backgrounds were around 13% for SVM and 7% for Convolutional Nets. On a segmentation/recognition task with highly cluttered images, SVM proved impractical, while Convolutional nets yielded 16/7% error. A real-time version of the system was implemented that can detect and classify objects in natural scenes at around 10 frames per second.

Original languageEnglish (US)
Title of host publicationProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Volume2
StatePublished - 2004
EventProceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2004 - Washington, DC, United States
Duration: Jun 27 2004Jul 2 2004

Other

OtherProceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2004
CountryUnited States
CityWashington, DC
Period6/27/047/2/04

Fingerprint

Object recognition
Invariance
Lighting
Testing
Image resolution
Trucks
Support vector machines
Animals
Railroad cars
Pixels
Aircraft

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Computer Vision and Pattern Recognition
  • Software
  • Control and Systems Engineering

Cite this

LeCun, Y., Huang, F. J., & Bottou, L. (2004). Learning methods for generic object recognition with invariance to pose and lighting. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Vol. 2)

Learning methods for generic object recognition with invariance to pose and lighting. / LeCun, Yann; Huang, Fu Jie; Bottou, Léon.

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Vol. 2 2004.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

LeCun, Y, Huang, FJ & Bottou, L 2004, Learning methods for generic object recognition with invariance to pose and lighting. in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. vol. 2, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2004, Washington, DC, United States, 6/27/04.
LeCun Y, Huang FJ, Bottou L. Learning methods for generic object recognition with invariance to pose and lighting. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Vol. 2. 2004
LeCun, Yann ; Huang, Fu Jie ; Bottou, Léon. / Learning methods for generic object recognition with invariance to pose and lighting. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Vol. 2 2004.
@inproceedings{4d68f7f108974791beef021c242e49f6,
title = "Learning methods for generic object recognition with invariance to pose and lighting",
abstract = "We assess the applicability of several popular learning methods for the problem of recognizing generic visual categories with invariance to pose, lighting, and surrounding clutter. A large dataset comprising stereo image pairs of 50 uniform-colored toys under 36 azimuths, 9 elevations, and 6 lighting conditions was collected (for a total of 194,400 individual images). The objects were 10 instances of 5 generic categories: four-legged animals, human figures, airplanes, trucks, and cars. Five instances of each category were used for training, and the other five for testing. Low-resolution grayscale images of the objects with various amounts of variability and surrounding clutter were used for training and testing. Nearest Neighbor methods, Support Vector Machines, and Convolutional Networks, operating on raw pixels or on PCA-derived features were tested. Test error rates for unseen object instances placed on uniform backgrounds were around 13{\%} for SVM and 7{\%} for Convolutional Nets. On a segmentation/recognition task with highly cluttered images, SVM proved impractical, while Convolutional nets yielded 16/7{\%} error. A real-time version of the system was implemented that can detect and classify objects in natural scenes at around 10 frames per second.",
author = "Yann LeCun and Huang, {Fu Jie} and L{\'e}on Bottou",
year = "2004",
language = "English (US)",
volume = "2",
booktitle = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

}

TY - GEN

T1 - Learning methods for generic object recognition with invariance to pose and lighting

AU - LeCun, Yann

AU - Huang, Fu Jie

AU - Bottou, Léon

PY - 2004

Y1 - 2004

N2 - We assess the applicability of several popular learning methods for the problem of recognizing generic visual categories with invariance to pose, lighting, and surrounding clutter. A large dataset comprising stereo image pairs of 50 uniform-colored toys under 36 azimuths, 9 elevations, and 6 lighting conditions was collected (for a total of 194,400 individual images). The objects were 10 instances of 5 generic categories: four-legged animals, human figures, airplanes, trucks, and cars. Five instances of each category were used for training, and the other five for testing. Low-resolution grayscale images of the objects with various amounts of variability and surrounding clutter were used for training and testing. Nearest Neighbor methods, Support Vector Machines, and Convolutional Networks, operating on raw pixels or on PCA-derived features were tested. Test error rates for unseen object instances placed on uniform backgrounds were around 13% for SVM and 7% for Convolutional Nets. On a segmentation/recognition task with highly cluttered images, SVM proved impractical, while Convolutional nets yielded 16/7% error. A real-time version of the system was implemented that can detect and classify objects in natural scenes at around 10 frames per second.

AB - We assess the applicability of several popular learning methods for the problem of recognizing generic visual categories with invariance to pose, lighting, and surrounding clutter. A large dataset comprising stereo image pairs of 50 uniform-colored toys under 36 azimuths, 9 elevations, and 6 lighting conditions was collected (for a total of 194,400 individual images). The objects were 10 instances of 5 generic categories: four-legged animals, human figures, airplanes, trucks, and cars. Five instances of each category were used for training, and the other five for testing. Low-resolution grayscale images of the objects with various amounts of variability and surrounding clutter were used for training and testing. Nearest Neighbor methods, Support Vector Machines, and Convolutional Networks, operating on raw pixels or on PCA-derived features were tested. Test error rates for unseen object instances placed on uniform backgrounds were around 13% for SVM and 7% for Convolutional Nets. On a segmentation/recognition task with highly cluttered images, SVM proved impractical, while Convolutional nets yielded 16/7% error. A real-time version of the system was implemented that can detect and classify objects in natural scenes at around 10 frames per second.

UR - http://www.scopus.com/inward/record.url?scp=5044231640&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=5044231640&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:5044231640

VL - 2

BT - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

ER -