Stereo matching by training a convolutional neural network to compare image patches

Jure Žbontar, Yann LeCun

Research output: Contribution to journalArticle

Abstract

We present a method for extracting depth information from a rectified image pair. Our approach focuses on the first stage of many stereo algorithms: the matching cost computation. We approach the problem by learning a similarity measure on small image patches using a convolutional neural network. Training is carried out in a supervised manner by constructing a binary classification data set with examples of similar and dissimilar pairs of patches. We examine two network architectures for this task: one tuned for speed, the other for accuracy. The output of the convolutional neural network is used to initialize the stereo matching cost. A series of post-processing steps follow: cross-based cost aggregation, semiglobal matching, a left-right consistency check, subpixel enhancement, a median filter, and a bilateral filter. We evaluate our method on the KITTI 2012, KITTI 2015, and Middlebury stereo data sets and show that it outperforms other approaches on all three data sets.

Original languageEnglish (US)
JournalJournal of Machine Learning Research
Volume17
StatePublished - Apr 1 2016

Fingerprint

Stereo Matching
Patch
Neural Networks
Neural networks
Costs
Bilateral Filter
Median Filter
Median filters
Binary Classification
Sub-pixel
Network Architecture
Network architecture
Similarity Measure
Post-processing
Aggregation
Agglomeration
Enhancement
Series
Evaluate
Output

Keywords

  • Convolutional neural networks
  • Matching cost
  • Similarity learning
  • Stereo
  • Supervised learning

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Statistics and Probability
  • Artificial Intelligence

Cite this

Stereo matching by training a convolutional neural network to compare image patches. / Žbontar, Jure; LeCun, Yann.

In: Journal of Machine Learning Research, Vol. 17, 01.04.2016.

Research output: Contribution to journalArticle

@article{f8413c5fdb3f4b528c23091acc8dd762,
title = "Stereo matching by training a convolutional neural network to compare image patches",
abstract = "We present a method for extracting depth information from a rectified image pair. Our approach focuses on the first stage of many stereo algorithms: the matching cost computation. We approach the problem by learning a similarity measure on small image patches using a convolutional neural network. Training is carried out in a supervised manner by constructing a binary classification data set with examples of similar and dissimilar pairs of patches. We examine two network architectures for this task: one tuned for speed, the other for accuracy. The output of the convolutional neural network is used to initialize the stereo matching cost. A series of post-processing steps follow: cross-based cost aggregation, semiglobal matching, a left-right consistency check, subpixel enhancement, a median filter, and a bilateral filter. We evaluate our method on the KITTI 2012, KITTI 2015, and Middlebury stereo data sets and show that it outperforms other approaches on all three data sets.",
keywords = "Convolutional neural networks, Matching cost, Similarity learning, Stereo, Supervised learning",
author = "Jure Žbontar and Yann LeCun",
year = "2016",
month = "4",
day = "1",
language = "English (US)",
volume = "17",
journal = "Journal of Machine Learning Research",
issn = "1532-4435",
publisher = "Microtome Publishing",

}

TY - JOUR

T1 - Stereo matching by training a convolutional neural network to compare image patches

AU - Žbontar, Jure

AU - LeCun, Yann

PY - 2016/4/1

Y1 - 2016/4/1

N2 - We present a method for extracting depth information from a rectified image pair. Our approach focuses on the first stage of many stereo algorithms: the matching cost computation. We approach the problem by learning a similarity measure on small image patches using a convolutional neural network. Training is carried out in a supervised manner by constructing a binary classification data set with examples of similar and dissimilar pairs of patches. We examine two network architectures for this task: one tuned for speed, the other for accuracy. The output of the convolutional neural network is used to initialize the stereo matching cost. A series of post-processing steps follow: cross-based cost aggregation, semiglobal matching, a left-right consistency check, subpixel enhancement, a median filter, and a bilateral filter. We evaluate our method on the KITTI 2012, KITTI 2015, and Middlebury stereo data sets and show that it outperforms other approaches on all three data sets.

AB - We present a method for extracting depth information from a rectified image pair. Our approach focuses on the first stage of many stereo algorithms: the matching cost computation. We approach the problem by learning a similarity measure on small image patches using a convolutional neural network. Training is carried out in a supervised manner by constructing a binary classification data set with examples of similar and dissimilar pairs of patches. We examine two network architectures for this task: one tuned for speed, the other for accuracy. The output of the convolutional neural network is used to initialize the stereo matching cost. A series of post-processing steps follow: cross-based cost aggregation, semiglobal matching, a left-right consistency check, subpixel enhancement, a median filter, and a bilateral filter. We evaluate our method on the KITTI 2012, KITTI 2015, and Middlebury stereo data sets and show that it outperforms other approaches on all three data sets.

KW - Convolutional neural networks

KW - Matching cost

KW - Similarity learning

KW - Stereo

KW - Supervised learning

UR - http://www.scopus.com/inward/record.url?scp=84979924151&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84979924151&partnerID=8YFLogxK

M3 - Article

VL - 17

JO - Journal of Machine Learning Research

JF - Journal of Machine Learning Research

SN - 1532-4435

ER -