An FPGA-based stream processor for embedded real-time vision with convolutional networks

Clément Farabet, Cyril Poulet, Yann LeCun

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Many recent visual recognition systems can be seen as being composed of multiple layers of convolutional filter banks, interspersed with various types of non-linearities. This includes Convolutional Networks, HMAX-type architectures, as well as systems based on dense SIFT features or Histogram of Gradients. This paper describes a highlycompact and low power embedded system that can run such vision systems at very high speed. A custom board built around a Xilinx Virtex-4 FPGA was built and tested. It measures 70 x 80 mm, and the complete system - FPGA, camera, memory chips, flash - consumes 15 watts in peak, and is capable of more than 4 × 109 multiply-accumulate operations per second in real vision application. This enables real-time implementations of object detection, object recognition, and vision-based navigation algorithms in small-size robots, micro-UAVs, and hand-held devices. Real-time face detection is demonstrated, with speeds of 10 frames per second at VGA resolution.

Original languageEnglish (US)
Title of host publication2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 2009
Pages878-885
Number of pages8
DOIs
StatePublished - 2009
Event2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 2009 - Kyoto, Japan
Duration: Sep 27 2009Oct 4 2009

Other

Other2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 2009
CountryJapan
CityKyoto
Period9/27/0910/4/09

Fingerprint

Field programmable gate arrays (FPGA)
Filter banks
Object recognition
End effectors
Unmanned aerial vehicles (UAV)
Face recognition
Embedded systems
Navigation
Cameras
Robots
Data storage equipment
Object detection

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Electrical and Electronic Engineering

Cite this

Farabet, C., Poulet, C., & LeCun, Y. (2009). An FPGA-based stream processor for embedded real-time vision with convolutional networks. In 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 2009 (pp. 878-885). [5457611] https://doi.org/10.1109/ICCVW.2009.5457611

An FPGA-based stream processor for embedded real-time vision with convolutional networks. / Farabet, Clément; Poulet, Cyril; LeCun, Yann.

2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 2009. 2009. p. 878-885 5457611.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Farabet, C, Poulet, C & LeCun, Y 2009, An FPGA-based stream processor for embedded real-time vision with convolutional networks. in 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 2009., 5457611, pp. 878-885, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 2009, Kyoto, Japan, 9/27/09. https://doi.org/10.1109/ICCVW.2009.5457611
Farabet C, Poulet C, LeCun Y. An FPGA-based stream processor for embedded real-time vision with convolutional networks. In 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 2009. 2009. p. 878-885. 5457611 https://doi.org/10.1109/ICCVW.2009.5457611
Farabet, Clément ; Poulet, Cyril ; LeCun, Yann. / An FPGA-based stream processor for embedded real-time vision with convolutional networks. 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 2009. 2009. pp. 878-885
@inproceedings{9c2748d54185457ab224eeda3e8b99d4,
title = "An FPGA-based stream processor for embedded real-time vision with convolutional networks",
abstract = "Many recent visual recognition systems can be seen as being composed of multiple layers of convolutional filter banks, interspersed with various types of non-linearities. This includes Convolutional Networks, HMAX-type architectures, as well as systems based on dense SIFT features or Histogram of Gradients. This paper describes a highlycompact and low power embedded system that can run such vision systems at very high speed. A custom board built around a Xilinx Virtex-4 FPGA was built and tested. It measures 70 x 80 mm, and the complete system - FPGA, camera, memory chips, flash - consumes 15 watts in peak, and is capable of more than 4 × 109 multiply-accumulate operations per second in real vision application. This enables real-time implementations of object detection, object recognition, and vision-based navigation algorithms in small-size robots, micro-UAVs, and hand-held devices. Real-time face detection is demonstrated, with speeds of 10 frames per second at VGA resolution.",
author = "Cl{\'e}ment Farabet and Cyril Poulet and Yann LeCun",
year = "2009",
doi = "10.1109/ICCVW.2009.5457611",
language = "English (US)",
isbn = "9781424444427",
pages = "878--885",
booktitle = "2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 2009",

}

TY - GEN

T1 - An FPGA-based stream processor for embedded real-time vision with convolutional networks

AU - Farabet, Clément

AU - Poulet, Cyril

AU - LeCun, Yann

PY - 2009

Y1 - 2009

N2 - Many recent visual recognition systems can be seen as being composed of multiple layers of convolutional filter banks, interspersed with various types of non-linearities. This includes Convolutional Networks, HMAX-type architectures, as well as systems based on dense SIFT features or Histogram of Gradients. This paper describes a highlycompact and low power embedded system that can run such vision systems at very high speed. A custom board built around a Xilinx Virtex-4 FPGA was built and tested. It measures 70 x 80 mm, and the complete system - FPGA, camera, memory chips, flash - consumes 15 watts in peak, and is capable of more than 4 × 109 multiply-accumulate operations per second in real vision application. This enables real-time implementations of object detection, object recognition, and vision-based navigation algorithms in small-size robots, micro-UAVs, and hand-held devices. Real-time face detection is demonstrated, with speeds of 10 frames per second at VGA resolution.

AB - Many recent visual recognition systems can be seen as being composed of multiple layers of convolutional filter banks, interspersed with various types of non-linearities. This includes Convolutional Networks, HMAX-type architectures, as well as systems based on dense SIFT features or Histogram of Gradients. This paper describes a highlycompact and low power embedded system that can run such vision systems at very high speed. A custom board built around a Xilinx Virtex-4 FPGA was built and tested. It measures 70 x 80 mm, and the complete system - FPGA, camera, memory chips, flash - consumes 15 watts in peak, and is capable of more than 4 × 109 multiply-accumulate operations per second in real vision application. This enables real-time implementations of object detection, object recognition, and vision-based navigation algorithms in small-size robots, micro-UAVs, and hand-held devices. Real-time face detection is demonstrated, with speeds of 10 frames per second at VGA resolution.

UR - http://www.scopus.com/inward/record.url?scp=77953224252&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77953224252&partnerID=8YFLogxK

U2 - 10.1109/ICCVW.2009.5457611

DO - 10.1109/ICCVW.2009.5457611

M3 - Conference contribution

SN - 9781424444427

SP - 878

EP - 885

BT - 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 2009

ER -