Learning pairwise neural network encoder for depth image-based 3d model retrieval

Jing Zhu, Fan Zhu, Edward Wong, Yi Fang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

With the emergence of RGB-D cameras (e.g., Kinect), the sensing capability of artificial intelligence systems has been dramatically increased, and as a consequence, a wide range of depth image-based human-machine interaction applica-tions are proposed. In design industry, a 3D model always contains abundant information, which are required for man-ufacture. Since depth images can be conveniently acquired, a retrieval system that can return 3D models based on depth image inputs can assist or improve the traditional product design process. In this work, we address the depth image-based 3D model retrieval problem. By extending the neural network to a neural network pair with identical output lay-ers for objects of the same category, unified domain-invariant representations can be learned based on the low-level mis-matched depth image features and 3D model features. A unique advantage of the framework is that the correspon-dence information between depth images and 3D models are not required, so that it can easily be generalized to large-scale databases. In order to evaluate the effiectiveness of our approach, depth images (with Kinect-Type noise) in the NYU Depth V2 dataset are used as queries to retrieve 3D models of the same categories in the SHREC 2014 dataset. Experimental results suggest that our approach can outper-form the state-of-The-Arts methods, and the paradigm that directly uses the original representations of depth images and 3D models for retrieval.

Original languageEnglish (US)
Title of host publicationMM 2015 - Proceedings of the 2015 ACM Multimedia Conference
PublisherAssociation for Computing Machinery, Inc
Pages1227-1230
Number of pages4
ISBN (Electronic)9781450334594
DOIs
StatePublished - Oct 13 2015
Event23rd ACM International Conference on Multimedia, MM 2015 - Brisbane, Australia
Duration: Oct 26 2015Oct 30 2015

Other

Other23rd ACM International Conference on Multimedia, MM 2015
CountryAustralia
CityBrisbane
Period10/26/1510/30/15

Fingerprint

Neural networks
Product design
Artificial intelligence
Cameras
Industry

Keywords

  • Cross-Domain
  • Depth Image
  • Neural Network
  • Retrieval

ASJC Scopus subject areas

  • Media Technology
  • Computer Graphics and Computer-Aided Design
  • Computer Vision and Pattern Recognition
  • Software

Cite this

Zhu, J., Zhu, F., Wong, E., & Fang, Y. (2015). Learning pairwise neural network encoder for depth image-based 3d model retrieval. In MM 2015 - Proceedings of the 2015 ACM Multimedia Conference (pp. 1227-1230). Association for Computing Machinery, Inc. https://doi.org/10.1145/2733373.2806323

Learning pairwise neural network encoder for depth image-based 3d model retrieval. / Zhu, Jing; Zhu, Fan; Wong, Edward; Fang, Yi.

MM 2015 - Proceedings of the 2015 ACM Multimedia Conference. Association for Computing Machinery, Inc, 2015. p. 1227-1230.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zhu, J, Zhu, F, Wong, E & Fang, Y 2015, Learning pairwise neural network encoder for depth image-based 3d model retrieval. in MM 2015 - Proceedings of the 2015 ACM Multimedia Conference. Association for Computing Machinery, Inc, pp. 1227-1230, 23rd ACM International Conference on Multimedia, MM 2015, Brisbane, Australia, 10/26/15. https://doi.org/10.1145/2733373.2806323
Zhu J, Zhu F, Wong E, Fang Y. Learning pairwise neural network encoder for depth image-based 3d model retrieval. In MM 2015 - Proceedings of the 2015 ACM Multimedia Conference. Association for Computing Machinery, Inc. 2015. p. 1227-1230 https://doi.org/10.1145/2733373.2806323
Zhu, Jing ; Zhu, Fan ; Wong, Edward ; Fang, Yi. / Learning pairwise neural network encoder for depth image-based 3d model retrieval. MM 2015 - Proceedings of the 2015 ACM Multimedia Conference. Association for Computing Machinery, Inc, 2015. pp. 1227-1230
@inproceedings{819621a4ca4a460386d7519251e68a14,
title = "Learning pairwise neural network encoder for depth image-based 3d model retrieval",
abstract = "With the emergence of RGB-D cameras (e.g., Kinect), the sensing capability of artificial intelligence systems has been dramatically increased, and as a consequence, a wide range of depth image-based human-machine interaction applica-tions are proposed. In design industry, a 3D model always contains abundant information, which are required for man-ufacture. Since depth images can be conveniently acquired, a retrieval system that can return 3D models based on depth image inputs can assist or improve the traditional product design process. In this work, we address the depth image-based 3D model retrieval problem. By extending the neural network to a neural network pair with identical output lay-ers for objects of the same category, unified domain-invariant representations can be learned based on the low-level mis-matched depth image features and 3D model features. A unique advantage of the framework is that the correspon-dence information between depth images and 3D models are not required, so that it can easily be generalized to large-scale databases. In order to evaluate the effiectiveness of our approach, depth images (with Kinect-Type noise) in the NYU Depth V2 dataset are used as queries to retrieve 3D models of the same categories in the SHREC 2014 dataset. Experimental results suggest that our approach can outper-form the state-of-The-Arts methods, and the paradigm that directly uses the original representations of depth images and 3D models for retrieval.",
keywords = "Cross-Domain, Depth Image, Neural Network, Retrieval",
author = "Jing Zhu and Fan Zhu and Edward Wong and Yi Fang",
year = "2015",
month = "10",
day = "13",
doi = "10.1145/2733373.2806323",
language = "English (US)",
pages = "1227--1230",
booktitle = "MM 2015 - Proceedings of the 2015 ACM Multimedia Conference",
publisher = "Association for Computing Machinery, Inc",

}

TY - GEN

T1 - Learning pairwise neural network encoder for depth image-based 3d model retrieval

AU - Zhu, Jing

AU - Zhu, Fan

AU - Wong, Edward

AU - Fang, Yi

PY - 2015/10/13

Y1 - 2015/10/13

N2 - With the emergence of RGB-D cameras (e.g., Kinect), the sensing capability of artificial intelligence systems has been dramatically increased, and as a consequence, a wide range of depth image-based human-machine interaction applica-tions are proposed. In design industry, a 3D model always contains abundant information, which are required for man-ufacture. Since depth images can be conveniently acquired, a retrieval system that can return 3D models based on depth image inputs can assist or improve the traditional product design process. In this work, we address the depth image-based 3D model retrieval problem. By extending the neural network to a neural network pair with identical output lay-ers for objects of the same category, unified domain-invariant representations can be learned based on the low-level mis-matched depth image features and 3D model features. A unique advantage of the framework is that the correspon-dence information between depth images and 3D models are not required, so that it can easily be generalized to large-scale databases. In order to evaluate the effiectiveness of our approach, depth images (with Kinect-Type noise) in the NYU Depth V2 dataset are used as queries to retrieve 3D models of the same categories in the SHREC 2014 dataset. Experimental results suggest that our approach can outper-form the state-of-The-Arts methods, and the paradigm that directly uses the original representations of depth images and 3D models for retrieval.

AB - With the emergence of RGB-D cameras (e.g., Kinect), the sensing capability of artificial intelligence systems has been dramatically increased, and as a consequence, a wide range of depth image-based human-machine interaction applica-tions are proposed. In design industry, a 3D model always contains abundant information, which are required for man-ufacture. Since depth images can be conveniently acquired, a retrieval system that can return 3D models based on depth image inputs can assist or improve the traditional product design process. In this work, we address the depth image-based 3D model retrieval problem. By extending the neural network to a neural network pair with identical output lay-ers for objects of the same category, unified domain-invariant representations can be learned based on the low-level mis-matched depth image features and 3D model features. A unique advantage of the framework is that the correspon-dence information between depth images and 3D models are not required, so that it can easily be generalized to large-scale databases. In order to evaluate the effiectiveness of our approach, depth images (with Kinect-Type noise) in the NYU Depth V2 dataset are used as queries to retrieve 3D models of the same categories in the SHREC 2014 dataset. Experimental results suggest that our approach can outper-form the state-of-The-Arts methods, and the paradigm that directly uses the original representations of depth images and 3D models for retrieval.

KW - Cross-Domain

KW - Depth Image

KW - Neural Network

KW - Retrieval

UR - http://www.scopus.com/inward/record.url?scp=84962903684&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84962903684&partnerID=8YFLogxK

U2 - 10.1145/2733373.2806323

DO - 10.1145/2733373.2806323

M3 - Conference contribution

SP - 1227

EP - 1230

BT - MM 2015 - Proceedings of the 2015 ACM Multimedia Conference

PB - Association for Computing Machinery, Inc

ER -