Energy-based models in document recognition and computer vision

Yann LeCun, Sumit Chopra, Marc Aurelio Ranzato, Fu Jie Huang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The Machine Learning and Pattern Recognition communities are facing two challenges: solving the normalization problem, and solving the deep learning problem. The normalization problem is related to the difficulty of training probabilistic models over large spaces while keeping them properly normalized. In recent years, the ML and Natural Language communities have devoted considerable efforts to circumventing this problem by developing "unnormalized" learning models for tasks in which the output is highly structured (e.g. English sentences). This class of models was in fact originally developed during the 90's in the handwriting recognition community, and includes Graph Transformer Networks, Conditional Random Fields, Hidden Markov SVMs, and Maximum Margin Markov Networks. We describe these models within the unifying framework of "Energy-Based Models" (EBM). The Deep Learning Problem is related to the issue of training all the levels of a recognition system (e.g. segmentation, feature extraction, recognition, etc) in an integrated fashion. We first consider "traditional" methods for deep learning, such as convolutional networks and back-propagation, and show that, although they produce very low error rates for handwriting and object recognition, they require many training samples. We show that using unsupervised learning to initialize the layers of a deep network dramatically reduces the required number of training samples, particularly for such tasks as the recognition of everyday objects at the category level.

Original languageEnglish (US)
Title of host publicationProceedings - 9th International Conference on Document Analysis and Recognition, ICDAR 2007
Pages337-341
Number of pages5
Volume1
DOIs
StatePublished - 2007
Event9th International Conference on Document Analysis and Recognition, ICDAR 2007 - Curitiba, Brazil
Duration: Sep 23 2007Sep 26 2007

Other

Other9th International Conference on Document Analysis and Recognition, ICDAR 2007
CountryBrazil
CityCuritiba
Period9/23/079/26/07

Fingerprint

Computer vision
Unsupervised learning
Object recognition
Backpropagation
Pattern recognition
Learning systems
Feature extraction
Deep learning

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition

Cite this

LeCun, Y., Chopra, S., Ranzato, M. A., & Huang, F. J. (2007). Energy-based models in document recognition and computer vision. In Proceedings - 9th International Conference on Document Analysis and Recognition, ICDAR 2007 (Vol. 1, pp. 337-341). [4378728] https://doi.org/10.1109/ICDAR.2007.4378728

Energy-based models in document recognition and computer vision. / LeCun, Yann; Chopra, Sumit; Ranzato, Marc Aurelio; Huang, Fu Jie.

Proceedings - 9th International Conference on Document Analysis and Recognition, ICDAR 2007. Vol. 1 2007. p. 337-341 4378728.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

LeCun, Y, Chopra, S, Ranzato, MA & Huang, FJ 2007, Energy-based models in document recognition and computer vision. in Proceedings - 9th International Conference on Document Analysis and Recognition, ICDAR 2007. vol. 1, 4378728, pp. 337-341, 9th International Conference on Document Analysis and Recognition, ICDAR 2007, Curitiba, Brazil, 9/23/07. https://doi.org/10.1109/ICDAR.2007.4378728
LeCun Y, Chopra S, Ranzato MA, Huang FJ. Energy-based models in document recognition and computer vision. In Proceedings - 9th International Conference on Document Analysis and Recognition, ICDAR 2007. Vol. 1. 2007. p. 337-341. 4378728 https://doi.org/10.1109/ICDAR.2007.4378728
LeCun, Yann ; Chopra, Sumit ; Ranzato, Marc Aurelio ; Huang, Fu Jie. / Energy-based models in document recognition and computer vision. Proceedings - 9th International Conference on Document Analysis and Recognition, ICDAR 2007. Vol. 1 2007. pp. 337-341
@inproceedings{5133985918954eebbe240587b68b292f,
title = "Energy-based models in document recognition and computer vision",
abstract = "The Machine Learning and Pattern Recognition communities are facing two challenges: solving the normalization problem, and solving the deep learning problem. The normalization problem is related to the difficulty of training probabilistic models over large spaces while keeping them properly normalized. In recent years, the ML and Natural Language communities have devoted considerable efforts to circumventing this problem by developing {"}unnormalized{"} learning models for tasks in which the output is highly structured (e.g. English sentences). This class of models was in fact originally developed during the 90's in the handwriting recognition community, and includes Graph Transformer Networks, Conditional Random Fields, Hidden Markov SVMs, and Maximum Margin Markov Networks. We describe these models within the unifying framework of {"}Energy-Based Models{"} (EBM). The Deep Learning Problem is related to the issue of training all the levels of a recognition system (e.g. segmentation, feature extraction, recognition, etc) in an integrated fashion. We first consider {"}traditional{"} methods for deep learning, such as convolutional networks and back-propagation, and show that, although they produce very low error rates for handwriting and object recognition, they require many training samples. We show that using unsupervised learning to initialize the layers of a deep network dramatically reduces the required number of training samples, particularly for such tasks as the recognition of everyday objects at the category level.",
author = "Yann LeCun and Sumit Chopra and Ranzato, {Marc Aurelio} and Huang, {Fu Jie}",
year = "2007",
doi = "10.1109/ICDAR.2007.4378728",
language = "English (US)",
isbn = "0769528228",
volume = "1",
pages = "337--341",
booktitle = "Proceedings - 9th International Conference on Document Analysis and Recognition, ICDAR 2007",

}

TY - GEN

T1 - Energy-based models in document recognition and computer vision

AU - LeCun, Yann

AU - Chopra, Sumit

AU - Ranzato, Marc Aurelio

AU - Huang, Fu Jie

PY - 2007

Y1 - 2007

N2 - The Machine Learning and Pattern Recognition communities are facing two challenges: solving the normalization problem, and solving the deep learning problem. The normalization problem is related to the difficulty of training probabilistic models over large spaces while keeping them properly normalized. In recent years, the ML and Natural Language communities have devoted considerable efforts to circumventing this problem by developing "unnormalized" learning models for tasks in which the output is highly structured (e.g. English sentences). This class of models was in fact originally developed during the 90's in the handwriting recognition community, and includes Graph Transformer Networks, Conditional Random Fields, Hidden Markov SVMs, and Maximum Margin Markov Networks. We describe these models within the unifying framework of "Energy-Based Models" (EBM). The Deep Learning Problem is related to the issue of training all the levels of a recognition system (e.g. segmentation, feature extraction, recognition, etc) in an integrated fashion. We first consider "traditional" methods for deep learning, such as convolutional networks and back-propagation, and show that, although they produce very low error rates for handwriting and object recognition, they require many training samples. We show that using unsupervised learning to initialize the layers of a deep network dramatically reduces the required number of training samples, particularly for such tasks as the recognition of everyday objects at the category level.

AB - The Machine Learning and Pattern Recognition communities are facing two challenges: solving the normalization problem, and solving the deep learning problem. The normalization problem is related to the difficulty of training probabilistic models over large spaces while keeping them properly normalized. In recent years, the ML and Natural Language communities have devoted considerable efforts to circumventing this problem by developing "unnormalized" learning models for tasks in which the output is highly structured (e.g. English sentences). This class of models was in fact originally developed during the 90's in the handwriting recognition community, and includes Graph Transformer Networks, Conditional Random Fields, Hidden Markov SVMs, and Maximum Margin Markov Networks. We describe these models within the unifying framework of "Energy-Based Models" (EBM). The Deep Learning Problem is related to the issue of training all the levels of a recognition system (e.g. segmentation, feature extraction, recognition, etc) in an integrated fashion. We first consider "traditional" methods for deep learning, such as convolutional networks and back-propagation, and show that, although they produce very low error rates for handwriting and object recognition, they require many training samples. We show that using unsupervised learning to initialize the layers of a deep network dramatically reduces the required number of training samples, particularly for such tasks as the recognition of everyday objects at the category level.

UR - http://www.scopus.com/inward/record.url?scp=51249093914&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=51249093914&partnerID=8YFLogxK

U2 - 10.1109/ICDAR.2007.4378728

DO - 10.1109/ICDAR.2007.4378728

M3 - Conference contribution

SN - 0769528228

SN - 9780769528229

VL - 1

SP - 337

EP - 341

BT - Proceedings - 9th International Conference on Document Analysis and Recognition, ICDAR 2007

ER -