SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection

  title={SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection},
  author={Prarthana Bhattacharyya and Chengjie Huang and K. Czarnecki},
  journal={2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)},
Existing point-cloud based 3D object detectors use convolution-like operators to process information in a local neighbourhood with fixed-weight kernels and aggregate global context hierarchically. However, non-local neural networks and self-attention for 2D vision have shown that explicitly modeling long-range interactions can lead to more robust and competitive models. In this paper, we propose two variants of self-attention for contextual modeling in 3D object detection by augmenting… 

RAANet: Range-Aware Attention Network for LiDAR-based 3D Object Detection with Auxiliary Density Level Estimation

Range-Aware Attention Network (RAANet), which extracts more powerful BEV features and generates superior 3D object detections, and proposes a novel auxiliary loss for density estimation to further enhance the detection accuracy of RAANet for occluded objects.

Real-time Hierarchical Soft Attention-based 3D Object Detection in Point Clouds

A real-time Hierarchical Soft Attention Network (HSAN) is proposed to employ soft attention in the backbone of the original network to increase the detection accuracy without slowing down its inference speed.

Accurate and Real-Time 3D Pedestrian Detection Using an Efficient Attentive Pillar Network

This work introduces a stackable Pillar Aware Attention (PAA) module to enhance pillar feature extraction while suppressing noises in point clouds, and presents Mini-BiFPN, a small yet effective feature network that creates bidirectional information flow and multi-level cross-scale feature fusion to better integrate multi-resolution features.

AGS-SSD: Attention-Guided Sampling for 3D Single-Stage Detector

An attention-guided downsampling method for point-cloud-based 3D object detection, named AGS-SSD, which achieves significant improvements with novel architectures against the baseline and runs at 24 frames per second for inference.

PiFeNet: Pillar-Feature Network for Real-Time 3D Pedestrian Detection from Point Cloud

This work introduces a stackable Pillar Aware Attention (PAA) module for enhanced pillar features extraction while suppressing noises in the point clouds and presents Mini-BiFPN, a small yet effective feature network that creates bidirectional information flow and multi-level cross-scale feature fusion to better integrate multi-resolution features.

3D Object Detection Combining Semantic and Geometric Features from Point Clouds

The VTPM is a Voxel-Point-Based Module that finally implements 3D object detection in point space, which is more conducive to the detection of small-size objects and avoids the presets of anchors in inference stage.

D-Align: Dual Query Co-attention Network for 3D Object Detection Based on Multi-frame Point Cloud Sequence

A new 3D object detector, named D-Align, is proposed, which can effectively produce strong bird’s-eye-view (BEV) features by aligning and aggregating the features obtained from a sequence of point sets.

DANC-Net: Dual-Attention and Negative Constraint Network for Point Cloud Classification

In the DANC-Net, the dual-attention mechanism is utilized to strengthen the interaction between local features of point cloud signal from both channel and space, thereby improving the expression ability of extracted features.

Point Density-Aware Voxels for LiDAR 3D Object Detection

Point Density-Aware Voxel network (PDV) is an end-to-end two stage LiDAR 3D object detection architecture that outperforms all state-of-the-art methods on the Waymo Open Dataset and achieves competitive results on the KITTI dataset.

3D Vision with Transformers: A Survey

A systematic and thorough review of more than 100 transformers methods for different 3D vision tasks, including classification, segmentation, detection, completion, pose estimation, and others, and compares their performance to common non-transformer methods on 12 3D benchmarks.



Scanet: Spatial-channel Attention Network for 3D Object Detection

A novel Spatial-Channel Attention Network (SCANet), a two-stage detector that takes both LIDAR point clouds and RGB images as input to generate 3D object estimates, and a new multi-level fusion scheme for accurate classification and 3D bounding box regression is designed.

MLCVNet: Multi-Level Context VoteNet for 3D Object Detection

  • Qian XieYu-Kun Lai Jun Wang
  • Computer Science
    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
This paper introduces three context modules into the voting and classifying stages of VoteNet to encode contextual information at different levels, and proposes Multi-Level Context VoteNet (MLCVNet) to recognize 3D objects correlatively, building on the state-of-the-art VoteNet.

TANet: Robust 3D Object Detection from Point Clouds with Triple Attention

A novel TANet is introduced in this paper, which mainly contains a Triple Attention (TA) module, and a Coarse-to-Fine Regression (CFR) module that boosts the accuracy of localization without excessive computation cost.

PointDAN: A Multi-Scale 3D Domain Adaption Network for Point Cloud Representation

A novel 3D Domain Adaptation Network for point cloud data (PointDAN) is proposed, which jointly aligns the global and local features in multi-level and demonstrates the superiority of the model over the state-of-the-art general-purpose DA methods.

3D Object Detection with Pointformer

This paper proposes Pointformer, a Transformer backbone designed for 3D point clouds to learn features effectively, and introduces an efficient coordinate refinement module to shift down-sampled points closer to object centroids, which improves object proposal generation.

Attentional ShapeContextNet for Point Cloud Recognition

The resulting model, called ShapeContextNet, consists of a hierarchy with modules not relying on a fixed grid while still enjoying properties similar to those in convolutional neural networks - being able to capture and propagate the object part information.

Deep Hough Voting for 3D Object Detection in Point Clouds

This work proposes VoteNet, an end-to-end 3D object detection network based on a synergy of deep point set networks and Hough voting that achieves state-of-the-art 3D detection on two large datasets of real 3D scans, ScanNet and SUN RGB-D with a simple design, compact model size and high efficiency.

Frustum PointNets for 3D Object Detection from RGB-D Data

This work directly operates on raw point clouds by popping up RGBD scans and leverages both mature 2D object detectors and advanced 3D deep learning for object localization, achieving efficiency as well as high recall for even small objects.

PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud

Extensive experiments on the 3D detection benchmark of KITTI dataset show that the proposed architecture outperforms state-of-the-art methods with remarkable margins by using only point cloud as input.

Attentional PointNet for 3D-Object Detection in Point Clouds

This study proposes Attentional PointNet, which is a novel end-to-end trainable deep architecture for object detection in point clouds that extends the theory of visual attention mechanisms to 3D point clouds and introduces a new recurrent 3D Localization Network module.