Depth map prediction from a single image using a multi-scale deep network

David Eigen, Christian Puhrsch, Rob Fergus

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Predicting depth is an essential component in understanding the 3D geometry of a scene. While for stereo images local correspondence suffices for estimation, finding depth relations from a single image is less straightforward, requiring integration of both global and local information from various cues. Moreover, the task is inherently ambiguous, with a large source of uncertainty coming from the overall scale. In this paper, we present a new method that addresses this task by employing two deep network stacks: one that makes a coarse global prediction based on the entire image, and another that refines this prediction locally. We also apply a scale-invariant error to help measure depth relations rather than scale. By leveraging the raw datasets as large sources of training data, our method achieves state-of-the-art results on both NYU Depth and KITTI, and matches detailed depth boundaries without the need for superpixelation.

Original languageEnglish (US)
Title of host publicationAdvances in Neural Information Processing Systems
PublisherNeural information processing systems foundation
Pages2366-2374
Number of pages9
Volume3
EditionJanuary
StatePublished - 2014
Event28th Annual Conference on Neural Information Processing Systems 2014, NIPS 2014 - Montreal, Canada
Duration: Dec 8 2014Dec 13 2014

Other

Other28th Annual Conference on Neural Information Processing Systems 2014, NIPS 2014
CountryCanada
CityMontreal
Period12/8/1412/13/14

Fingerprint

Geometry
Uncertainty

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Cite this

Eigen, D., Puhrsch, C., & Fergus, R. (2014). Depth map prediction from a single image using a multi-scale deep network. In Advances in Neural Information Processing Systems (January ed., Vol. 3, pp. 2366-2374). Neural information processing systems foundation.

Depth map prediction from a single image using a multi-scale deep network. / Eigen, David; Puhrsch, Christian; Fergus, Rob.

Advances in Neural Information Processing Systems. Vol. 3 January. ed. Neural information processing systems foundation, 2014. p. 2366-2374.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Eigen, D, Puhrsch, C & Fergus, R 2014, Depth map prediction from a single image using a multi-scale deep network. in Advances in Neural Information Processing Systems. January edn, vol. 3, Neural information processing systems foundation, pp. 2366-2374, 28th Annual Conference on Neural Information Processing Systems 2014, NIPS 2014, Montreal, Canada, 12/8/14.
Eigen D, Puhrsch C, Fergus R. Depth map prediction from a single image using a multi-scale deep network. In Advances in Neural Information Processing Systems. January ed. Vol. 3. Neural information processing systems foundation. 2014. p. 2366-2374
Eigen, David ; Puhrsch, Christian ; Fergus, Rob. / Depth map prediction from a single image using a multi-scale deep network. Advances in Neural Information Processing Systems. Vol. 3 January. ed. Neural information processing systems foundation, 2014. pp. 2366-2374
@inproceedings{8244823d1561465088a0db2f6f9786a1,
title = "Depth map prediction from a single image using a multi-scale deep network",
abstract = "Predicting depth is an essential component in understanding the 3D geometry of a scene. While for stereo images local correspondence suffices for estimation, finding depth relations from a single image is less straightforward, requiring integration of both global and local information from various cues. Moreover, the task is inherently ambiguous, with a large source of uncertainty coming from the overall scale. In this paper, we present a new method that addresses this task by employing two deep network stacks: one that makes a coarse global prediction based on the entire image, and another that refines this prediction locally. We also apply a scale-invariant error to help measure depth relations rather than scale. By leveraging the raw datasets as large sources of training data, our method achieves state-of-the-art results on both NYU Depth and KITTI, and matches detailed depth boundaries without the need for superpixelation.",
author = "David Eigen and Christian Puhrsch and Rob Fergus",
year = "2014",
language = "English (US)",
volume = "3",
pages = "2366--2374",
booktitle = "Advances in Neural Information Processing Systems",
publisher = "Neural information processing systems foundation",
edition = "January",

}

TY - GEN

T1 - Depth map prediction from a single image using a multi-scale deep network

AU - Eigen, David

AU - Puhrsch, Christian

AU - Fergus, Rob

PY - 2014

Y1 - 2014

N2 - Predicting depth is an essential component in understanding the 3D geometry of a scene. While for stereo images local correspondence suffices for estimation, finding depth relations from a single image is less straightforward, requiring integration of both global and local information from various cues. Moreover, the task is inherently ambiguous, with a large source of uncertainty coming from the overall scale. In this paper, we present a new method that addresses this task by employing two deep network stacks: one that makes a coarse global prediction based on the entire image, and another that refines this prediction locally. We also apply a scale-invariant error to help measure depth relations rather than scale. By leveraging the raw datasets as large sources of training data, our method achieves state-of-the-art results on both NYU Depth and KITTI, and matches detailed depth boundaries without the need for superpixelation.

AB - Predicting depth is an essential component in understanding the 3D geometry of a scene. While for stereo images local correspondence suffices for estimation, finding depth relations from a single image is less straightforward, requiring integration of both global and local information from various cues. Moreover, the task is inherently ambiguous, with a large source of uncertainty coming from the overall scale. In this paper, we present a new method that addresses this task by employing two deep network stacks: one that makes a coarse global prediction based on the entire image, and another that refines this prediction locally. We also apply a scale-invariant error to help measure depth relations rather than scale. By leveraging the raw datasets as large sources of training data, our method achieves state-of-the-art results on both NYU Depth and KITTI, and matches detailed depth boundaries without the need for superpixelation.

UR - http://www.scopus.com/inward/record.url?scp=84937943470&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84937943470&partnerID=8YFLogxK

M3 - Conference contribution

VL - 3

SP - 2366

EP - 2374

BT - Advances in Neural Information Processing Systems

PB - Neural information processing systems foundation

ER -