Point Transformer

@article{Engel2021PointT,
  title={Point Transformer},
  author={Nico Engel and Vasileios Belagiannis and Klaus C. J. Dietmayer},
  journal={IEEE Access},
  year={2021},
  volume={9},
  pages={134826-134840}
}
In this work, we present Point Transformer, a deep neural network that operates directly on unordered and unstructured point sets. We design Point Transformer to extract local and global features and relate both representations by introducing the local-global attention mechanism, which aims to capture spatial point relations and shape information. For that purpose, we propose SortNet, as part of the Point Transformer, which induces input permutation invariance by selecting points based on a… 
PatchFormer: An Efficient Point Transformer with Patch Attention
TLDR
This work introduces Patch ATtention to adaptively learn a much smaller set of bases upon which the attention maps are computed, and proposes a lightweight Multi-Scale aTtention block to build attentions among features of different scales, providing the model with multi-scale features.
PatchFormer: A Versatile 3D Transformer Based on Patch Attention
TLDR
This work introduces patch-attention to adaptively learn a much smaller set of bases upon which the attention maps are computed, and proposes a lightweight Multi-scale Attention (MSA) block to build attentions among features of different scales, providing the model with multi-scale features.
CpT: Convolutional Point Transformer for 3D Point Cloud Processing
TLDR
The novel CpT block builds over local neighbourhoods of points obtained via a dynamic graph computation at each layer of the networks’ structure and is fully differentiable and can be stacked just like convolutional layers to learn global properties of the points.
PU-Transformer: Point Cloud Upsampling Transformer
TLDR
To activate the transformer’s strong capability in representing features, a new variant of a multi-head self-attention structure is developed to enhance both point-wise and channel-wise relations of the feature map.
Adaptive Channel Encoding Transformer for Point Cloud Analysis
TLDR
An adaptive channel encoding transformer called Transformer-Conv is proposed, designed to encode the channel adaptively and is superior to state-of-the-art point cloud classification and segmentation methods on three benchmark datasets.
PVT: Point-Voxel Transformer for 3D Deep Learning
In this paper, we present an efficient and high-performance neural architecture, termed Point-Voxel Transformer (PVT) for 3D deep learning, which deeply integrates both 3D voxelbased and point-based
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling
TLDR
The proposed BERT-style pretraining strategy significantly improves the performance of standard point cloud Transformers and the representations learned by Point-BERT transfer well to new tasks and domains, where the models largely advance the state-of-the-art of few-shot point cloud classification task.
A Survey on Vision Transformer
TLDR
This paper reviews these vision transformer models by categorizing them in different tasks and analyzing their advantages and disadvantages, and takes a brief look at the self-attention mechanism in computer vision, as it is the base component in transformer.
Point-Voxel Transformer: An Efficient Approach To 3D Deep Learning
TLDR
This work presents a novel 3D Transformer, called Point-Voxel Transformer (PVT) that leverages self-attention computation in points to gather global context features, while performing multi-head self-ATTention (MSA) computation in voxels to capture local information and reduce the irregular data access.
PVT: Point-Voxel Transformer for Point Cloud Learning
TLDR
Sparse Window Attention (SWA) module is presented to gather coarse-grained local features from non-empty voxels, which not only bypasses the expensive irregular data structuring and invalid empty voxel computation, but also obtains linear computational complexity with respect to voxe resolution.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 47 REFERENCES
Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks
TLDR
This work presents an attention-based neural network module, the Set Transformer, specifically designed to model interactions among elements in the input set, and reduces the computation time of self-attention from quadratic to linear in the number of Elements in the set.
PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space
TLDR
A hierarchical neural network that applies PointNet recursively on a nested partitioning of the input point set and proposes novel set learning layers to adaptively combine features from multiple scales to learn deep point set features efficiently and robustly.
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
TLDR
This paper designs a novel type of neural network that directly consumes point clouds, which well respects the permutation invariance of points in the input and provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing.
Attentional ShapeContextNet for Point Cloud Recognition
TLDR
The resulting model, called ShapeContextNet, consists of a hierarchy with modules not relying on a fixed grid while still enjoying properties similar to those in convolutional neural networks - being able to capture and propagate the object part information.
PointCNN: Convolution On X-Transformed Points
TLDR
This work proposes to learn an Χ-transformation from the input points to simultaneously promote two causes: the first is the weighting of the input features associated with the points, and the second is the permutation of the points into a latent and potentially canonical order.
SpiderCNN: Deep Learning on Point Sets with Parameterized Convolutional Filters
TLDR
This work proposes a novel convolutional architecture, termed SpiderCNN, to efficiently extract geometric features from point clouds, which inherits the multi-scale hierarchical architecture from the classical CNNs, which allows it to extract semantic deep features.
Modeling Point Clouds With Self-Attention and Gumbel Subset Sampling
TLDR
This work develops Point Attention Transformers (PATs), using a parameter-efficient Group Shuffle Attention (GSA) to replace the costly Multi-Head Attention, and proposes an end-to-end learnable and task-agnostic sampling operation, named Gumbel Subset Sampling (GSS), to select a representative subset of input points.
Point2Sequence: Learning the Shape Representation of 3D Point Clouds with an Attention-based Sequence to Sequence Network
TLDR
A novel deep learning model for 3D point clouds is proposed, named Point2Sequence, to learn 3D shape features by capturing fine-grained contextual information in a novel implicit way, and achieves state-of-the-art performance in shape classification and segmentation tasks.
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
TLDR
This work proposes a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit and derives a robust initialization method that particularly considers the rectifier nonlinearities.
OctNet: Learning Deep 3D Representations at High Resolutions
TLDR
The utility of the OctNet representation is demonstrated by analyzing the impact of resolution on several 3D tasks including 3D object classification, orientation estimation and point cloud labeling.
...
1
2
3
4
5
...