P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds

  title={P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds},
  author={Haozhe Qi and Chen Feng and ZHIGUO CAO and Feng Zhao and Yang Xiao},
  journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  • Haozhe Qi, Chen Feng, Yang Xiao
  • Published 28 May 2020
  • Computer Science
  • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Towards 3D object tracking in point clouds, a novel point-to-box network termed P2B is proposed in an end-to-end learning manner. Our main idea is to first localize potential target centers in 3D search area embedded with target information. Then point-driven 3D target proposal and verification are executed jointly. In this way, the time-consuming 3D exhaustive search can be avoided. Specifically, we first sample seeds from the point clouds in template and search area respectively. Then, we… 

Graph-Based Point Tracker for 3D Object Tracking in Point Clouds

Experiments on the KITTI tracking dataset show that GPT achieves state-of-the-art performance and can run in real-time.

PTTR: Relational 3D Point Cloud Object Tracking with Transformer

This work proposes Point Relation Transformer (PTTR), which efficiently predicts high-quality 3D tracking results in a coarse-to-fine manner with the help of transformer operations and creates a large-scale point cloud single object tracking benchmark based on the Waymo Open Dataset.

Exploring Point-BEV Fusion for 3D Point Cloud Object Tracking with Transformer

A more advanced framework named PTTR++, which incorporates both the point-wise view and BEV representation to exploit their complementary effect in generating high-quality tracking results, substantially boosts the tracking performance on top of PTTR with low computational overhead.

3D Siamese Voxel-to-BEV Tracker for Sparse Point Clouds

This work proposes a Siamese voxel-to-BEV tracker, which can significantly improve the tracking performance in sparse 3D point clouds and significantly outperforms the current state-of-the-art methods by a large margin.

Implicit and Efficient Point Cloud Completion for 3D Single Object Tracking

A strong pipeline to extract discriminative features and conduct the matching procedure with the attention mechanism is designed, and ARP module is proposed to tackle the misalignment issue by aggregating all predicted candidates with valuable clues, and the overall framework is called PCET.

PTT: Point-Track-Transformer Module for 3D Single Object Tracking in Point Clouds

This paper proposes a transformer module called Point-Track-Transformer (PTT) for point cloud-based 3D single object tracking, which contains three blocks for feature embedding, position encoding, and self-attention feature computation.

3D Siamese Transformer Network for Single Object Tracking on Point Clouds

A Siamese point Transformer network is developed to learn shape context information of the target and develops an iterative coarse-to-fine correlation network to learn the robust cross correlation between the template and the search area.

Learning the Incremental Warp for 3D Vehicle Tracking in LiDAR Point Clouds

This study leveraged a powerful target discriminator and an accurate state estimator to robustly track target objects in challenging point cloud scenarios and proposed a state estimation subnetwork that aims to learn the incremental warp for updating the coarse target state.

A Lightweight and Detector-free 3D Single Object Tracker on Point Clouds

DMT, a Detector-free Motion prediction based 3D Tracking network that totally removes the usage of complicated 3D detectors, which is lighter, faster, and more accurate than previous trackers is proposed.

3D-SiamRPN: An End-to-End Learning Method for Real-Time 3D Single Object Tracking Using Raw Point Cloud

This paper proposes a 3D tracking method called 3D-SiamRPN Network to track a single target object by using raw 3D point cloud data and shows that this method has a competitive performance in both Success and Precision compared to the state-of-the-art methods.



PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud

Extensive experiments on the 3D detection benchmark of KITTI dataset show that the proposed architecture outperforms state-of-the-art methods with remarkable margins by using only point cloud as input.

Deep Hough Voting for 3D Object Detection in Point Clouds

This work proposes VoteNet, an end-to-end 3D object detection network based on a synergy of deep point set networks and Hough voting that achieves state-of-the-art 3D detection on two large datasets of real 3D scans, ScanNet and SUN RGB-D with a simple design, compact model size and high efficiency.

STD: Sparse-to-Dense 3D Object Detector for Point Cloud

This work proposes a two-stage 3D object detection framework, named sparse-to-dense 3D Object Detector (STD), and implements a parallel intersection-over-union (IoU) branch to increase awareness of localization accuracy, resulting in further improved performance.

Leveraging Shape Completion for 3D Siamese Tracking

A Siamese tracker is designed that encodes model and candidate shapes into a compact latent representation, regularize the encoding by enforcing the latent representation to decode into an object model shape, observing that 3D object tracking and 3D shape completion complement each other.

Frustum PointNets for 3D Object Detection from RGB-D Data

This work directly operates on raw point clouds by popping up RGBD scans and leverages both mature 2D object detectors and advanced 3D deep learning for object localization, achieving efficiency as well as high recall for even small objects.

Point-To-Pose Voting Based Hand Pose Estimation Using Residual Permutation Equivariant Layer

  • Shile LiDongheui Lee
  • Computer Science
    2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2019
This paper uses Permutation Equivariant Layer (PEL) as the basic element, where a residual network version of PEL is proposed for the hand pose estimation task and proposes a voting-based scheme to merge information from individual points to the final pose output.

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

This paper designs a novel type of neural network that directly consumes point clouds, which well respects the permutation invariance of points in the input and provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing.

Object Tracking by Reconstruction With View-Specific Discriminative Correlation Filters

The proposed long-term RGB-D tracker called OTR – Object Tracking by Reconstruction performs online 3D target reconstruction to facilitate robust learning of a set of view-specific discriminative correlation filters (DCFs).

Context-Aware Three-Dimensional Mean-Shift With Occlusion Handling for Robust Object Tracking in RGB-D Videos

This paper investigates a 3-D extension of a classical mean-shift tracker whose greedy gradient ascend strategy is generally considered as unreliable in conventional 2-D tracking, and proposes two important mechanisms to further boost the tracker's robustness.

SHPR-Net: Deep Semantic Hand Pose Regression From Point Clouds

A deep semantic hand pose regression network (SHPR-Net) for hand pose estimation from point sets, which consists of two subnetworks: a semantic segmentation subnetwork and a hand poses regression subnetwork, which is more robust to geometric transformations.