introduction
Geometric 3D scene classification is a very challenging task. Current methodologies extract the geometric information using only a depth channel provided by an RGB-D sensor. These kinds of methodologies introduce possible errors due to missing local geometric context in the depth channel. This work proposes a novel Residual Attention Graph Convolutional Network that exploits the intrinsic geometric context inside a 3D space without using any kind of point features, allowing the use of organized or unorganized 3D data. Experiments are done in NYU Depth v1 and SUN-RGBD datasets to study the different configurations and to demonstrate the effectiveness of the proposed method. Experimental results show that the proposed method outperforms current state-of-the-art in geometric 3D scene classification tasks.
If you find this work useful, please consider citing:
Albert Mosella-Montoro, Javier Ruiz-Hidalgo, "Residual Attention Graph Convolutional Network for Geometric 3D Scene Classification", ICCVW 2019
@inproceedings{Mosella-Montoro2019RAGC, author = {Albert Mosella-Montoro and Javier Ruiz-Hidalgo}, title = {Residual Attention Graph Convolutional Network for Geometric 3D Scene Classification}, booktitle = {IEEE Conference on Computer Vision Workshop (ICCVW)}, year = {2019} }
Download our paper in pdf here.
Method
In this work the following operations are described:
Graph Construction: It is an important step on Graph Convolutional Networks as connections between nodes (edges) act as the receptive field on conventional CNNs. Edges indicate the influence between nodes in the graph. Graph Construction can be seen as three different stages: a) Project RGB-D image to 3D space. If the input is a 3D point cloud, this step can be skipped. b) Create the connectivity between nodes. Two methods will be explored: Radius proximity connection and K< nearest neighbours (kNN). Both have the particularity that the edges are directed. c) Add attributes to each edge of the graph.
Attention Graph Convolution: This operation performs convolutions over local graph neighbourhoods exploiting the attributes of the edges. An intuitive explanation of the proposal is that the lattice space that is needed to do a convolution is artificially created using edges. These edges have a direct influence on the weights of the filter used to calculate the convolution. Depending on the edge attribute a weight will be generated. This generation of weights is done by a Dynamic Filter Network which can be implemented with any differentiable architecture.
Residual Attention Graph Convolution: The previous Attention Graph Convolution (AGC) is extended to a Residual Attention Graph Convolution (RAGC) following the inspiration of the ResNet architecture.
Pooling Graph Operation: It is done using the Voxel downsample algorithm. It consists of creating voxels of resolution r over the point cloud and replacing all points inside the voxel with their centroid. The feature of the new point is the average or the maximum (depends on the kind of pooling done) of the features of the points inside the voxel. After the pooling operation is done, the graph is reconstructed from the downsampled 3D point cloud.
Residual Attention Graph Convolution: The previous Attention Graph Convolution (AGC) is extended to a Residual Attention Graph Convolution (RAGC) following the inspiration of the ResNet architecture.
Pooling Graph Operation: It is done using the Voxel downsample algorithm. It consists of creating voxels of resolution r over the point cloud and replacing all points inside the voxel with their centroid. The feature of the new point is the average or the maximum (depends on the kind of pooling done) of the features of the points inside the voxel. After the pooling operation is done, the graph is reconstructed from the downsampled 3D point cloud.
The proposed architecture based on ResNet-18 is showed in the following table.
Results
Results on NYU Depth V1 dataset:
Results on SUNRGBD dataset:
Confusion matrixes of both datasets:
code
acknowledgements
We want to thank our technical support team:
This research was supported by Secretary of Universities and Research of the Generalitat de Catalunya and the European
Social Fund via a PhD grant to the first author (FI2019), and developed in the framework of project TEC2016-75976-R,
financed by the Ministerio de Economía, Industria y Competitividad and the European Regional Development Fund (ERDF).