PointAcc: Efficient Point Cloud Accelerator

  title={PointAcc: Efficient Point Cloud Accelerator},
  author={Yujun Lin and Zhekai Zhang and Haotian Tang and Hanrui Wang and Song Han},
  journal={MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture},
  • Yujun Lin, Zhekai Zhang, Song Han
  • Published 14 October 2021
  • Computer Science
  • MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture
Deep learning on point clouds plays a vital role in a wide range of applications such as autonomous driving and AR/VR. These applications interact with people in real time on edge devices and thus require low latency and low energy. Compared to projecting the point cloud to 2D space, directly processing 3D point cloud yields higher accuracy and lower #MACs. However, the extremely sparse nature of point cloud poses challenges to hardware acceleration. For example, we need to explicitly determine… 
TorchSparse: Efficient Point Cloud Inference Engine
TorchSparse is introduced, a high-performance point cloud inference engine that accelerates the sparse convolution computation on GPUs and optimizes the two bottlenecks of sparse Convolution: data movement and irregular computation.
Crescent: Taming Memory Irregularities for Accelerating Deep Point Cloud Analytics
This paper proposes Crescent, an algorithm-hardware co-design system that tames the irregularities in deep point cloud analytics while achieving high accuracy, and introduces two approximation techniques, approximate neighbor search and selectively bank conflict elision, that “regularize” the DRAM and SRAM memory accesses.


Mesorasi: Architecture Support for Point Cloud Analytics via Delayed-Aggregation
This paper proposes Mesorasi, an algorithm-architecture co-designed system that simultaneously improves the performance and energy efficiency of point cloud analytics while retaining its accuracy, and proposes delayed-aggregation, a new algorithmic primitive for building efficient point cloud algorithms.
Tigris: Architecture and Algorithms for 3D Perception in Point Clouds
This paper presents Tigris, an algorithm-architecture co-designed system specialized for point cloud registration that systematically exploits the parallelism of KD-tree search while incorporating a set of architectural techniques that further improve the accelerator efficiency.
Dynamic Graph CNN for Learning on Point Clouds
This work proposes a new neural network module suitable for CNN-based high-level tasks on point clouds, including classification and segmentation called EdgeConv, which acts on graphs dynamically computed in each layer of the network.
PointConv: Deep Convolutional Networks on 3D Point Clouds
The dynamic filter is extended to a new convolution operation, named PointConv, which can be applied on point clouds to build deep convolutional networks and is able to achieve state-of-the-art on challenging semantic segmentation benchmarks on 3D point clouds.
PointCNN: Convolution On X-Transformed Points
This work proposes to learn an Χ-transformation from the input points to simultaneously promote two causes: the first is the weighting of the input features associated with the points, and the second is the permutation of the points into a latent and potentially canonical order.
In-datacenter performance analysis of a tensor processing unit
  • N. Jouppi, C. Young, D. Yoon
  • Computer Science
    2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)
  • 2017
This paper evaluates a custom ASIC-called a Tensor Processing Unit (TPU)-deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN) and compares it to a server-class Intel Haswell CPU and an Nvidia K80 GPU, which are contemporaries deployed in the samedatacenters.
Point-Voxel CNN for Efficient 3D Deep Learning
This paper proposes PVCNN that represents the 3D input data in points to reduce the memory consumption, while performing the convolutions in voxels to largely reduce the irregular data access and improve the locality.
Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks
Eyeriss is an accelerator for state-of-the-art deep convolutional neural networks (CNNs). It optimizes for the energy efficiency of the entire system, including the accelerator chip and off-chip
PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space
A hierarchical neural network that applies PointNet recursively on a nested partitioning of the input point set and proposes novel set learning layers to adaptively combine features from multiple scales to learn deep point set features efficiently and robustly.
Fused-layer CNN accelerators
This work finds that a previously unexplored dimension exists in the design space of CNN accelerators that focuses on the dataflow across convolutional layers, and is able to fuse the processing of multiple CNN layers by modifying the order in which the input data are brought on chip, enabling caching of intermediate data between the evaluation of adjacent CNN layers.