A sparse and locally shift invariant feature extractor applied to document images

Marc Aurelio Ranzato, Yann LeCun

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We describe an unsupervised learning algorithm for extracting sparse and locally shift-invariant features. We also devise a principled procedure for learning hierarchies of invariant features. Each feature detector is composed of a set of trainable convolutional filters followed by a max-pooling layer over non-overlapping windows, and a point-wise sigmoid non-linearity. A second stage of more invariant features is fed with patches provided by the first stage feature extractor, and is trained in the same way. The method is used to pre-train the first four layers of a deep convolutional network which achieves state-of-the-art performance on the MNIST dataset of handwritten digits. The final testing error rate is equal to 0.42%. Preliminary experiments on compression of bitonal document images show very promising results in terms of compression ratio and reconstruction error.

Original languageEnglish (US)
Title of host publicationProceedings - 9th International Conference on Document Analysis and Recognition, ICDAR 2007
Pages1213-1217
Number of pages5
Volume2
DOIs
StatePublished - 2007
Event9th International Conference on Document Analysis and Recognition, ICDAR 2007 - Curitiba, Brazil
Duration: Sep 23 2007Sep 26 2007

Other

Other9th International Conference on Document Analysis and Recognition, ICDAR 2007
CountryBrazil
CityCuritiba
Period9/23/079/26/07

Fingerprint

Unsupervised learning
Learning algorithms
Detectors
Testing
Experiments

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition

Cite this

Ranzato, M. A., & LeCun, Y. (2007). A sparse and locally shift invariant feature extractor applied to document images. In Proceedings - 9th International Conference on Document Analysis and Recognition, ICDAR 2007 (Vol. 2, pp. 1213-1217). [4377108] https://doi.org/10.1109/ICDAR.2007.4377108

A sparse and locally shift invariant feature extractor applied to document images. / Ranzato, Marc Aurelio; LeCun, Yann.

Proceedings - 9th International Conference on Document Analysis and Recognition, ICDAR 2007. Vol. 2 2007. p. 1213-1217 4377108.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ranzato, MA & LeCun, Y 2007, A sparse and locally shift invariant feature extractor applied to document images. in Proceedings - 9th International Conference on Document Analysis and Recognition, ICDAR 2007. vol. 2, 4377108, pp. 1213-1217, 9th International Conference on Document Analysis and Recognition, ICDAR 2007, Curitiba, Brazil, 9/23/07. https://doi.org/10.1109/ICDAR.2007.4377108
Ranzato MA, LeCun Y. A sparse and locally shift invariant feature extractor applied to document images. In Proceedings - 9th International Conference on Document Analysis and Recognition, ICDAR 2007. Vol. 2. 2007. p. 1213-1217. 4377108 https://doi.org/10.1109/ICDAR.2007.4377108
Ranzato, Marc Aurelio ; LeCun, Yann. / A sparse and locally shift invariant feature extractor applied to document images. Proceedings - 9th International Conference on Document Analysis and Recognition, ICDAR 2007. Vol. 2 2007. pp. 1213-1217
@inproceedings{6c928e448ea3407ab6e0d7e9cfe6409e,
title = "A sparse and locally shift invariant feature extractor applied to document images",
abstract = "We describe an unsupervised learning algorithm for extracting sparse and locally shift-invariant features. We also devise a principled procedure for learning hierarchies of invariant features. Each feature detector is composed of a set of trainable convolutional filters followed by a max-pooling layer over non-overlapping windows, and a point-wise sigmoid non-linearity. A second stage of more invariant features is fed with patches provided by the first stage feature extractor, and is trained in the same way. The method is used to pre-train the first four layers of a deep convolutional network which achieves state-of-the-art performance on the MNIST dataset of handwritten digits. The final testing error rate is equal to 0.42{\%}. Preliminary experiments on compression of bitonal document images show very promising results in terms of compression ratio and reconstruction error.",
author = "Ranzato, {Marc Aurelio} and Yann LeCun",
year = "2007",
doi = "10.1109/ICDAR.2007.4377108",
language = "English (US)",
isbn = "0769528228",
volume = "2",
pages = "1213--1217",
booktitle = "Proceedings - 9th International Conference on Document Analysis and Recognition, ICDAR 2007",

}

TY - GEN

T1 - A sparse and locally shift invariant feature extractor applied to document images

AU - Ranzato, Marc Aurelio

AU - LeCun, Yann

PY - 2007

Y1 - 2007

N2 - We describe an unsupervised learning algorithm for extracting sparse and locally shift-invariant features. We also devise a principled procedure for learning hierarchies of invariant features. Each feature detector is composed of a set of trainable convolutional filters followed by a max-pooling layer over non-overlapping windows, and a point-wise sigmoid non-linearity. A second stage of more invariant features is fed with patches provided by the first stage feature extractor, and is trained in the same way. The method is used to pre-train the first four layers of a deep convolutional network which achieves state-of-the-art performance on the MNIST dataset of handwritten digits. The final testing error rate is equal to 0.42%. Preliminary experiments on compression of bitonal document images show very promising results in terms of compression ratio and reconstruction error.

AB - We describe an unsupervised learning algorithm for extracting sparse and locally shift-invariant features. We also devise a principled procedure for learning hierarchies of invariant features. Each feature detector is composed of a set of trainable convolutional filters followed by a max-pooling layer over non-overlapping windows, and a point-wise sigmoid non-linearity. A second stage of more invariant features is fed with patches provided by the first stage feature extractor, and is trained in the same way. The method is used to pre-train the first four layers of a deep convolutional network which achieves state-of-the-art performance on the MNIST dataset of handwritten digits. The final testing error rate is equal to 0.42%. Preliminary experiments on compression of bitonal document images show very promising results in terms of compression ratio and reconstruction error.

UR - http://www.scopus.com/inward/record.url?scp=51149113745&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=51149113745&partnerID=8YFLogxK

U2 - 10.1109/ICDAR.2007.4377108

DO - 10.1109/ICDAR.2007.4377108

M3 - Conference contribution

AN - SCOPUS:51149113745

SN - 0769528228

SN - 9780769528229

VL - 2

SP - 1213

EP - 1217

BT - Proceedings - 9th International Conference on Document Analysis and Recognition, ICDAR 2007

ER -