Pairwise Attention Encoding for Point Cloud Feature Learning

Yunxiao Shi, Haoyu Fang, Jing Zhu, Yi Fang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Compared to hand-crafted ones, learning a 3D point signature has attracted increasing attention in the research community to better address challenging issues such as deformation and structural variation in 3D objects. PointNet is a pioneering work in introducing learning 3D point signature directly by consuming raw point cloud as input and applying convolution on each one of these points. Ground-breaking as it is, PointNet has limited capability in capturing local structure when learning visual features from each individual point. Recent variants of PointNet improved the quality of 3D point signature learning by taking neighbourhood information into account, but typically do so through hard-coded mechanisms (e.g. manually setting 'k' for k-Nearest Neighbour search, radius 'r' for Ball Query, etc). In this paper, we developed a novel point signature learning approach by considering pairwise interaction between every two individual points that moves beyond hard-coded neighbourhood exploitation, which further improves the quality of 3D point signature learning by encouraging the model to be aware of both neighbourhood information and global context. Specifically, we first introduce a novel pairwise reference tensor (PRT) in the original input point space to represent the influence of every two individual points that have on each other. Then, by passing the pairwise reference tensor through a multi-layer perceptron (MLP), we obtain a high-dimensional attention tensor that encodes pairwise relationships in high dimensional space that acts as an attention mechanism. Next we further fuse learned point features with the attention weights to obtain global visual features. Our proposed method has demonstrated superior performance on various 3D visual recognition tasks (e.g. object classification, part segmentation and scene semantic segmentation).

Original languageEnglish (US)
Title of host publicationProceedings - 2019 International Conference on 3D Vision, 3DV 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages135-144
Number of pages10
ISBN (Electronic)9781728131313
DOIs
StatePublished - Sep 2019
Event7th International Conference on 3D Vision, 3DV 2019 - Quebec, Canada
Duration: Sep 15 2019Sep 18 2019

Publication series

NameProceedings - 2019 International Conference on 3D Vision, 3DV 2019

Conference

Conference7th International Conference on 3D Vision, 3DV 2019
CountryCanada
CityQuebec
Period9/15/199/18/19

Fingerprint

Point Cloud
Tensors
Pairwise
Encoding
Signature
Electric fuses
Multilayer neural networks
Convolution
Semantics
Tensor
High-dimensional
Segmentation
Learning
Object Classification
Nearest Neighbor Search
Structure Learning
Feature Point
Local Structure
Perceptron
Exploitation

Keywords

  • 3D Vision
  • Computer Vision
  • Local Structure
  • Shape Classification
  • Shape Segmentation

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Media Technology
  • Modeling and Simulation

Cite this

Shi, Y., Fang, H., Zhu, J., & Fang, Y. (2019). Pairwise Attention Encoding for Point Cloud Feature Learning. In Proceedings - 2019 International Conference on 3D Vision, 3DV 2019 (pp. 135-144). [8885569] (Proceedings - 2019 International Conference on 3D Vision, 3DV 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/3DV.2019.00024

Pairwise Attention Encoding for Point Cloud Feature Learning. / Shi, Yunxiao; Fang, Haoyu; Zhu, Jing; Fang, Yi.

Proceedings - 2019 International Conference on 3D Vision, 3DV 2019. Institute of Electrical and Electronics Engineers Inc., 2019. p. 135-144 8885569 (Proceedings - 2019 International Conference on 3D Vision, 3DV 2019).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Shi, Y, Fang, H, Zhu, J & Fang, Y 2019, Pairwise Attention Encoding for Point Cloud Feature Learning. in Proceedings - 2019 International Conference on 3D Vision, 3DV 2019., 8885569, Proceedings - 2019 International Conference on 3D Vision, 3DV 2019, Institute of Electrical and Electronics Engineers Inc., pp. 135-144, 7th International Conference on 3D Vision, 3DV 2019, Quebec, Canada, 9/15/19. https://doi.org/10.1109/3DV.2019.00024
Shi Y, Fang H, Zhu J, Fang Y. Pairwise Attention Encoding for Point Cloud Feature Learning. In Proceedings - 2019 International Conference on 3D Vision, 3DV 2019. Institute of Electrical and Electronics Engineers Inc. 2019. p. 135-144. 8885569. (Proceedings - 2019 International Conference on 3D Vision, 3DV 2019). https://doi.org/10.1109/3DV.2019.00024
Shi, Yunxiao ; Fang, Haoyu ; Zhu, Jing ; Fang, Yi. / Pairwise Attention Encoding for Point Cloud Feature Learning. Proceedings - 2019 International Conference on 3D Vision, 3DV 2019. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 135-144 (Proceedings - 2019 International Conference on 3D Vision, 3DV 2019).
@inproceedings{272c2324709e45f3a87ed7237ca938c9,
title = "Pairwise Attention Encoding for Point Cloud Feature Learning",
abstract = "Compared to hand-crafted ones, learning a 3D point signature has attracted increasing attention in the research community to better address challenging issues such as deformation and structural variation in 3D objects. PointNet is a pioneering work in introducing learning 3D point signature directly by consuming raw point cloud as input and applying convolution on each one of these points. Ground-breaking as it is, PointNet has limited capability in capturing local structure when learning visual features from each individual point. Recent variants of PointNet improved the quality of 3D point signature learning by taking neighbourhood information into account, but typically do so through hard-coded mechanisms (e.g. manually setting 'k' for k-Nearest Neighbour search, radius 'r' for Ball Query, etc). In this paper, we developed a novel point signature learning approach by considering pairwise interaction between every two individual points that moves beyond hard-coded neighbourhood exploitation, which further improves the quality of 3D point signature learning by encouraging the model to be aware of both neighbourhood information and global context. Specifically, we first introduce a novel pairwise reference tensor (PRT) in the original input point space to represent the influence of every two individual points that have on each other. Then, by passing the pairwise reference tensor through a multi-layer perceptron (MLP), we obtain a high-dimensional attention tensor that encodes pairwise relationships in high dimensional space that acts as an attention mechanism. Next we further fuse learned point features with the attention weights to obtain global visual features. Our proposed method has demonstrated superior performance on various 3D visual recognition tasks (e.g. object classification, part segmentation and scene semantic segmentation).",
keywords = "3D Vision, Computer Vision, Local Structure, Shape Classification, Shape Segmentation",
author = "Yunxiao Shi and Haoyu Fang and Jing Zhu and Yi Fang",
year = "2019",
month = "9",
doi = "10.1109/3DV.2019.00024",
language = "English (US)",
series = "Proceedings - 2019 International Conference on 3D Vision, 3DV 2019",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "135--144",
booktitle = "Proceedings - 2019 International Conference on 3D Vision, 3DV 2019",

}

TY - GEN

T1 - Pairwise Attention Encoding for Point Cloud Feature Learning

AU - Shi, Yunxiao

AU - Fang, Haoyu

AU - Zhu, Jing

AU - Fang, Yi

PY - 2019/9

Y1 - 2019/9

N2 - Compared to hand-crafted ones, learning a 3D point signature has attracted increasing attention in the research community to better address challenging issues such as deformation and structural variation in 3D objects. PointNet is a pioneering work in introducing learning 3D point signature directly by consuming raw point cloud as input and applying convolution on each one of these points. Ground-breaking as it is, PointNet has limited capability in capturing local structure when learning visual features from each individual point. Recent variants of PointNet improved the quality of 3D point signature learning by taking neighbourhood information into account, but typically do so through hard-coded mechanisms (e.g. manually setting 'k' for k-Nearest Neighbour search, radius 'r' for Ball Query, etc). In this paper, we developed a novel point signature learning approach by considering pairwise interaction between every two individual points that moves beyond hard-coded neighbourhood exploitation, which further improves the quality of 3D point signature learning by encouraging the model to be aware of both neighbourhood information and global context. Specifically, we first introduce a novel pairwise reference tensor (PRT) in the original input point space to represent the influence of every two individual points that have on each other. Then, by passing the pairwise reference tensor through a multi-layer perceptron (MLP), we obtain a high-dimensional attention tensor that encodes pairwise relationships in high dimensional space that acts as an attention mechanism. Next we further fuse learned point features with the attention weights to obtain global visual features. Our proposed method has demonstrated superior performance on various 3D visual recognition tasks (e.g. object classification, part segmentation and scene semantic segmentation).

AB - Compared to hand-crafted ones, learning a 3D point signature has attracted increasing attention in the research community to better address challenging issues such as deformation and structural variation in 3D objects. PointNet is a pioneering work in introducing learning 3D point signature directly by consuming raw point cloud as input and applying convolution on each one of these points. Ground-breaking as it is, PointNet has limited capability in capturing local structure when learning visual features from each individual point. Recent variants of PointNet improved the quality of 3D point signature learning by taking neighbourhood information into account, but typically do so through hard-coded mechanisms (e.g. manually setting 'k' for k-Nearest Neighbour search, radius 'r' for Ball Query, etc). In this paper, we developed a novel point signature learning approach by considering pairwise interaction between every two individual points that moves beyond hard-coded neighbourhood exploitation, which further improves the quality of 3D point signature learning by encouraging the model to be aware of both neighbourhood information and global context. Specifically, we first introduce a novel pairwise reference tensor (PRT) in the original input point space to represent the influence of every two individual points that have on each other. Then, by passing the pairwise reference tensor through a multi-layer perceptron (MLP), we obtain a high-dimensional attention tensor that encodes pairwise relationships in high dimensional space that acts as an attention mechanism. Next we further fuse learned point features with the attention weights to obtain global visual features. Our proposed method has demonstrated superior performance on various 3D visual recognition tasks (e.g. object classification, part segmentation and scene semantic segmentation).

KW - 3D Vision

KW - Computer Vision

KW - Local Structure

KW - Shape Classification

KW - Shape Segmentation

UR - http://www.scopus.com/inward/record.url?scp=85075033034&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85075033034&partnerID=8YFLogxK

U2 - 10.1109/3DV.2019.00024

DO - 10.1109/3DV.2019.00024

M3 - Conference contribution

AN - SCOPUS:85075033034

T3 - Proceedings - 2019 International Conference on 3D Vision, 3DV 2019

SP - 135

EP - 144

BT - Proceedings - 2019 International Conference on 3D Vision, 3DV 2019

PB - Institute of Electrical and Electronics Engineers Inc.

ER -