Large-scale FPGA-based convolutional networks

Clément Farabet, Yann LeCun, Koray Kavukcuoglu, Berin Martini, Polina Akselrod, Selcuk Talay, Eugenio Culurciello

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

Micro-robots, unmanned aerial vehicles, imaging sensor networks, wireless phones, and other embedded vision systems all require low cost and high-speed implementations of synthetic vision systems capable of recognizing and categorizing objects in a scene. Many successful object recognition systems use dense features extracted on regularly spaced patches over the input image. The majority of the feature extraction systems have a common structure composed of a filter bank (generally based on oriented edge detectors or 2D Gabor functions), a nonlinear operation (quantization, winner-take-all, sparsification, normalization, and/or pointwise saturation), and finally a pooling operation (max, average, or histogramming). For example, the scale-invariant feature transform (SIFT) (Lowe, 2004) operator applies oriented edge filters to a small patch and determines the dominant orientation through a winner-take-all operation. Finally, the resulting sparse vectors are added (pooled) over a larger patch to form a local orientation histogram. Some recognition systems use a single stage of feature extractors (Lazebnik, Schmid, and Ponce, 2006; Dalal and Triggs, 2005; Berg, Berg, and Malik, 2005; Pinto, Cox, and DiCarlo, 2008). Other models such as HMAX-type models (Serre, Wolf, and Poggio, 2005; Mutch, and Lowe, 2006) and convolutional networks use two more layers of successive feature extractors. Different training algorithms have been used for learning the parameters of convolutional networks. In LeCun et al. (1998b) and Huang and LeCun (2006), pure supervised learning is used to update the parameters. However, recent works have focused on training with an auxiliary task (Ahmed et al., 2008) or using unsupervised objectives (Ranzato et al., 2007b; Kavukcuoglu et al., 2009; Jarrett et al., 2009; Lee et al., 2009).

Original languageEnglish (US)
Title of host publicationScaling up Machine Learning: Parallel and Distributed Approaches
PublisherCambridge University Press
Pages399-419
Number of pages21
ISBN (Print)9781139042918, 9780521192248
DOIs
StatePublished - Jan 1 2011

Fingerprint

Field programmable gate arrays (FPGA)
Filter banks
Supervised learning
Object recognition
Unmanned aerial vehicles (UAV)
Feature extraction
Wireless sensor networks
Robots
Detectors
Imaging techniques
Costs

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Farabet, C., LeCun, Y., Kavukcuoglu, K., Martini, B., Akselrod, P., Talay, S., & Culurciello, E. (2011). Large-scale FPGA-based convolutional networks. In Scaling up Machine Learning: Parallel and Distributed Approaches (pp. 399-419). Cambridge University Press. https://doi.org/10.1017/CBO9781139042918.020

Large-scale FPGA-based convolutional networks. / Farabet, Clément; LeCun, Yann; Kavukcuoglu, Koray; Martini, Berin; Akselrod, Polina; Talay, Selcuk; Culurciello, Eugenio.

Scaling up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, 2011. p. 399-419.

Research output: Chapter in Book/Report/Conference proceedingChapter

Farabet, C, LeCun, Y, Kavukcuoglu, K, Martini, B, Akselrod, P, Talay, S & Culurciello, E 2011, Large-scale FPGA-based convolutional networks. in Scaling up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, pp. 399-419. https://doi.org/10.1017/CBO9781139042918.020
Farabet C, LeCun Y, Kavukcuoglu K, Martini B, Akselrod P, Talay S et al. Large-scale FPGA-based convolutional networks. In Scaling up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press. 2011. p. 399-419 https://doi.org/10.1017/CBO9781139042918.020
Farabet, Clément ; LeCun, Yann ; Kavukcuoglu, Koray ; Martini, Berin ; Akselrod, Polina ; Talay, Selcuk ; Culurciello, Eugenio. / Large-scale FPGA-based convolutional networks. Scaling up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, 2011. pp. 399-419
@inbook{42e805bbc0214a999021c707e3419e56,
title = "Large-scale FPGA-based convolutional networks",
abstract = "Micro-robots, unmanned aerial vehicles, imaging sensor networks, wireless phones, and other embedded vision systems all require low cost and high-speed implementations of synthetic vision systems capable of recognizing and categorizing objects in a scene. Many successful object recognition systems use dense features extracted on regularly spaced patches over the input image. The majority of the feature extraction systems have a common structure composed of a filter bank (generally based on oriented edge detectors or 2D Gabor functions), a nonlinear operation (quantization, winner-take-all, sparsification, normalization, and/or pointwise saturation), and finally a pooling operation (max, average, or histogramming). For example, the scale-invariant feature transform (SIFT) (Lowe, 2004) operator applies oriented edge filters to a small patch and determines the dominant orientation through a winner-take-all operation. Finally, the resulting sparse vectors are added (pooled) over a larger patch to form a local orientation histogram. Some recognition systems use a single stage of feature extractors (Lazebnik, Schmid, and Ponce, 2006; Dalal and Triggs, 2005; Berg, Berg, and Malik, 2005; Pinto, Cox, and DiCarlo, 2008). Other models such as HMAX-type models (Serre, Wolf, and Poggio, 2005; Mutch, and Lowe, 2006) and convolutional networks use two more layers of successive feature extractors. Different training algorithms have been used for learning the parameters of convolutional networks. In LeCun et al. (1998b) and Huang and LeCun (2006), pure supervised learning is used to update the parameters. However, recent works have focused on training with an auxiliary task (Ahmed et al., 2008) or using unsupervised objectives (Ranzato et al., 2007b; Kavukcuoglu et al., 2009; Jarrett et al., 2009; Lee et al., 2009).",
author = "Cl{\'e}ment Farabet and Yann LeCun and Koray Kavukcuoglu and Berin Martini and Polina Akselrod and Selcuk Talay and Eugenio Culurciello",
year = "2011",
month = "1",
day = "1",
doi = "10.1017/CBO9781139042918.020",
language = "English (US)",
isbn = "9781139042918",
pages = "399--419",
booktitle = "Scaling up Machine Learning: Parallel and Distributed Approaches",
publisher = "Cambridge University Press",

}

TY - CHAP

T1 - Large-scale FPGA-based convolutional networks

AU - Farabet, Clément

AU - LeCun, Yann

AU - Kavukcuoglu, Koray

AU - Martini, Berin

AU - Akselrod, Polina

AU - Talay, Selcuk

AU - Culurciello, Eugenio

PY - 2011/1/1

Y1 - 2011/1/1

N2 - Micro-robots, unmanned aerial vehicles, imaging sensor networks, wireless phones, and other embedded vision systems all require low cost and high-speed implementations of synthetic vision systems capable of recognizing and categorizing objects in a scene. Many successful object recognition systems use dense features extracted on regularly spaced patches over the input image. The majority of the feature extraction systems have a common structure composed of a filter bank (generally based on oriented edge detectors or 2D Gabor functions), a nonlinear operation (quantization, winner-take-all, sparsification, normalization, and/or pointwise saturation), and finally a pooling operation (max, average, or histogramming). For example, the scale-invariant feature transform (SIFT) (Lowe, 2004) operator applies oriented edge filters to a small patch and determines the dominant orientation through a winner-take-all operation. Finally, the resulting sparse vectors are added (pooled) over a larger patch to form a local orientation histogram. Some recognition systems use a single stage of feature extractors (Lazebnik, Schmid, and Ponce, 2006; Dalal and Triggs, 2005; Berg, Berg, and Malik, 2005; Pinto, Cox, and DiCarlo, 2008). Other models such as HMAX-type models (Serre, Wolf, and Poggio, 2005; Mutch, and Lowe, 2006) and convolutional networks use two more layers of successive feature extractors. Different training algorithms have been used for learning the parameters of convolutional networks. In LeCun et al. (1998b) and Huang and LeCun (2006), pure supervised learning is used to update the parameters. However, recent works have focused on training with an auxiliary task (Ahmed et al., 2008) or using unsupervised objectives (Ranzato et al., 2007b; Kavukcuoglu et al., 2009; Jarrett et al., 2009; Lee et al., 2009).

AB - Micro-robots, unmanned aerial vehicles, imaging sensor networks, wireless phones, and other embedded vision systems all require low cost and high-speed implementations of synthetic vision systems capable of recognizing and categorizing objects in a scene. Many successful object recognition systems use dense features extracted on regularly spaced patches over the input image. The majority of the feature extraction systems have a common structure composed of a filter bank (generally based on oriented edge detectors or 2D Gabor functions), a nonlinear operation (quantization, winner-take-all, sparsification, normalization, and/or pointwise saturation), and finally a pooling operation (max, average, or histogramming). For example, the scale-invariant feature transform (SIFT) (Lowe, 2004) operator applies oriented edge filters to a small patch and determines the dominant orientation through a winner-take-all operation. Finally, the resulting sparse vectors are added (pooled) over a larger patch to form a local orientation histogram. Some recognition systems use a single stage of feature extractors (Lazebnik, Schmid, and Ponce, 2006; Dalal and Triggs, 2005; Berg, Berg, and Malik, 2005; Pinto, Cox, and DiCarlo, 2008). Other models such as HMAX-type models (Serre, Wolf, and Poggio, 2005; Mutch, and Lowe, 2006) and convolutional networks use two more layers of successive feature extractors. Different training algorithms have been used for learning the parameters of convolutional networks. In LeCun et al. (1998b) and Huang and LeCun (2006), pure supervised learning is used to update the parameters. However, recent works have focused on training with an auxiliary task (Ahmed et al., 2008) or using unsupervised objectives (Ranzato et al., 2007b; Kavukcuoglu et al., 2009; Jarrett et al., 2009; Lee et al., 2009).

UR - http://www.scopus.com/inward/record.url?scp=84923471298&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84923471298&partnerID=8YFLogxK

U2 - 10.1017/CBO9781139042918.020

DO - 10.1017/CBO9781139042918.020

M3 - Chapter

AN - SCOPUS:84923471298

SN - 9781139042918

SN - 9780521192248

SP - 399

EP - 419

BT - Scaling up Machine Learning: Parallel and Distributed Approaches

PB - Cambridge University Press

ER -