Siamese CNN-BILSTM architecture for 3D shape representation learning

Guoxian Dai, Jin Xie, Yi Fang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Learning a 3D shape representation from a collection of its rendered 2D images has been extensively studied. However, existing view-based techniques have not yet fully exploited the information among all the views of projections. In this paper, by employing recurrent neural network to efficiently capture features across different views, we propose a siamese CNN-BiLSTM network for 3D shape representation learning. The proposed method minimizes a discriminative loss function to learn a deep nonlinear transformation, mapping 3D shapes from the original space into a nonlinear feature space. In the transformed space, the distance of 3D shapes with the same label is minimized, otherwise the distance is maximized to a large margin. Specifically, the 3D shapes are first projected into a group of 2D images from different views. Then convolutional neural network (CNN) is adopted to extract features from different view images, followed by a bidirectional long short-term memory (LSTM) to aggregate information across different views. Finally, we construct the whole CNN-BiLSTM network into a siamese structure with contrastive loss function. Our proposed method is evaluated on two benchmarks, ModelNet40 and SHREC 2014, demonstrating superiority over the state-of-the-art methods.

Original languageEnglish (US)
Title of host publicationProceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI 2018
EditorsJerome Lang
PublisherInternational Joint Conferences on Artificial Intelligence
Pages670-676
Number of pages7
Volume2018-July
ISBN (Electronic)9780999241127
StatePublished - Jan 1 2018
Event27th International Joint Conference on Artificial Intelligence, IJCAI 2018 - Stockholm, Sweden
Duration: Jul 13 2018Jul 19 2018

Other

Other27th International Joint Conference on Artificial Intelligence, IJCAI 2018
CountrySweden
CityStockholm
Period7/13/187/19/18

Fingerprint

Network architecture
Neural networks
Recurrent neural networks
Labels

ASJC Scopus subject areas

  • Artificial Intelligence

Cite this

Dai, G., Xie, J., & Fang, Y. (2018). Siamese CNN-BILSTM architecture for 3D shape representation learning. In J. Lang (Ed.), Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI 2018 (Vol. 2018-July, pp. 670-676). International Joint Conferences on Artificial Intelligence.

Siamese CNN-BILSTM architecture for 3D shape representation learning. / Dai, Guoxian; Xie, Jin; Fang, Yi.

Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI 2018. ed. / Jerome Lang. Vol. 2018-July International Joint Conferences on Artificial Intelligence, 2018. p. 670-676.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Dai, G, Xie, J & Fang, Y 2018, Siamese CNN-BILSTM architecture for 3D shape representation learning. in J Lang (ed.), Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI 2018. vol. 2018-July, International Joint Conferences on Artificial Intelligence, pp. 670-676, 27th International Joint Conference on Artificial Intelligence, IJCAI 2018, Stockholm, Sweden, 7/13/18.
Dai G, Xie J, Fang Y. Siamese CNN-BILSTM architecture for 3D shape representation learning. In Lang J, editor, Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI 2018. Vol. 2018-July. International Joint Conferences on Artificial Intelligence. 2018. p. 670-676
Dai, Guoxian ; Xie, Jin ; Fang, Yi. / Siamese CNN-BILSTM architecture for 3D shape representation learning. Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI 2018. editor / Jerome Lang. Vol. 2018-July International Joint Conferences on Artificial Intelligence, 2018. pp. 670-676
@inproceedings{b2a051c28af04f7bbd4741cfcb4fd35c,
title = "Siamese CNN-BILSTM architecture for 3D shape representation learning",
abstract = "Learning a 3D shape representation from a collection of its rendered 2D images has been extensively studied. However, existing view-based techniques have not yet fully exploited the information among all the views of projections. In this paper, by employing recurrent neural network to efficiently capture features across different views, we propose a siamese CNN-BiLSTM network for 3D shape representation learning. The proposed method minimizes a discriminative loss function to learn a deep nonlinear transformation, mapping 3D shapes from the original space into a nonlinear feature space. In the transformed space, the distance of 3D shapes with the same label is minimized, otherwise the distance is maximized to a large margin. Specifically, the 3D shapes are first projected into a group of 2D images from different views. Then convolutional neural network (CNN) is adopted to extract features from different view images, followed by a bidirectional long short-term memory (LSTM) to aggregate information across different views. Finally, we construct the whole CNN-BiLSTM network into a siamese structure with contrastive loss function. Our proposed method is evaluated on two benchmarks, ModelNet40 and SHREC 2014, demonstrating superiority over the state-of-the-art methods.",
author = "Guoxian Dai and Jin Xie and Yi Fang",
year = "2018",
month = "1",
day = "1",
language = "English (US)",
volume = "2018-July",
pages = "670--676",
editor = "Jerome Lang",
booktitle = "Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI 2018",
publisher = "International Joint Conferences on Artificial Intelligence",

}

TY - GEN

T1 - Siamese CNN-BILSTM architecture for 3D shape representation learning

AU - Dai, Guoxian

AU - Xie, Jin

AU - Fang, Yi

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Learning a 3D shape representation from a collection of its rendered 2D images has been extensively studied. However, existing view-based techniques have not yet fully exploited the information among all the views of projections. In this paper, by employing recurrent neural network to efficiently capture features across different views, we propose a siamese CNN-BiLSTM network for 3D shape representation learning. The proposed method minimizes a discriminative loss function to learn a deep nonlinear transformation, mapping 3D shapes from the original space into a nonlinear feature space. In the transformed space, the distance of 3D shapes with the same label is minimized, otherwise the distance is maximized to a large margin. Specifically, the 3D shapes are first projected into a group of 2D images from different views. Then convolutional neural network (CNN) is adopted to extract features from different view images, followed by a bidirectional long short-term memory (LSTM) to aggregate information across different views. Finally, we construct the whole CNN-BiLSTM network into a siamese structure with contrastive loss function. Our proposed method is evaluated on two benchmarks, ModelNet40 and SHREC 2014, demonstrating superiority over the state-of-the-art methods.

AB - Learning a 3D shape representation from a collection of its rendered 2D images has been extensively studied. However, existing view-based techniques have not yet fully exploited the information among all the views of projections. In this paper, by employing recurrent neural network to efficiently capture features across different views, we propose a siamese CNN-BiLSTM network for 3D shape representation learning. The proposed method minimizes a discriminative loss function to learn a deep nonlinear transformation, mapping 3D shapes from the original space into a nonlinear feature space. In the transformed space, the distance of 3D shapes with the same label is minimized, otherwise the distance is maximized to a large margin. Specifically, the 3D shapes are first projected into a group of 2D images from different views. Then convolutional neural network (CNN) is adopted to extract features from different view images, followed by a bidirectional long short-term memory (LSTM) to aggregate information across different views. Finally, we construct the whole CNN-BiLSTM network into a siamese structure with contrastive loss function. Our proposed method is evaluated on two benchmarks, ModelNet40 and SHREC 2014, demonstrating superiority over the state-of-the-art methods.

UR - http://www.scopus.com/inward/record.url?scp=85055687213&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85055687213&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85055687213

VL - 2018-July

SP - 670

EP - 676

BT - Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI 2018

A2 - Lang, Jerome

PB - International Joint Conferences on Artificial Intelligence

ER -