MPPNet: Multi-Frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection

  title={MPPNet: Multi-Frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection},
  author={Xu Chen and Shaoshuai Shi and Benjin Zhu and Ka Chun Cheung and Hang Xu and Hongsheng Li},
. Accurate and reliable 3D detection is vital for many applications including autonomous driving vehicles and service robots. In this paper, we present a flexible and high-performance 3D detection framework, named MPPNet, for 3D temporal object detection with point cloud sequences. We propose a novel three-hierarchy framework with proxy points for multi-frame feature encoding and interactions to achieve better detection. The three hierarchies conduct per-frame feature encoding, short-clip… 

Figures and Tables from this paper

Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe

A full suite of practical guidebook to improve the performance of BEV perception tasks, including camera, LiDAR and fusion inputs are introduced, and the future research directions in this area are pointed out.

Motion Transformer with Global Intention Localization and Local Movement Refinement

Motion TRansformer (MTR) framework is proposed that models motion prediction as the joint optimization of global intention localization and local movement refinement and incorporates spatial intention priors by adopting a small set of learnable motion query pairs.



3D-MAN: 3D Multi-frame Attention Network for Object Detection

3D-MAN is presented: a 3D multi-frame attention network that effectively aggregates features from multiple perspectives and achieves state-of-the-art performance on Waymo Open Dataset.

PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud

Extensive experiments on the 3D detection benchmark of KITTI dataset show that the proposed architecture outperforms state-of-the-art methods with remarkable margins by using only point cloud as input.

Improving 3D Object Detection with Channel-wise Transformer

This paper leverages the high-quality region proposal network and a Channel-wise Transformer architecture to constitute the two-stage 3D object detection framework (CT3D) with minimal hand-crafted design and achieves superior performance and excellent scalability.

STD: Sparse-to-Dense 3D Object Detector for Point Cloud

This work proposes a two-stage 3D object detection framework, named sparse-to-dense 3D Object Detector (STD), and implements a parallel intersection-over-union (IoU) branch to increase awareness of localization accuracy, resulting in further improved performance.

PIXOR: Real-time 3D Object Detection from Point Clouds

PIXOR is proposed, a proposal-free, single-stage detector that outputs oriented 3D object estimates decoded from pixel-wise neural network predictions that surpasses other state-of-the-art methods notably in terms of Average Precision (AP), while still runs at 10 FPS.

Offboard 3D Object Detection from Point Cloud Sequences

This paper designs the offboard detector to make use of the temporal points through both multi-frame object detection and novel objectcentric refinement models, and proposes a novel offboard 3D object detection pipeline using point cloud sequence data.

VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection

  • Yin ZhouOncel Tuzel
  • Computer Science, Environmental Science
    2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • 2018
VoxelNet is proposed, a generic 3D detection network that unifies feature extraction and bounding box prediction into a single stage, end-to-end trainable deep network and learns an effective discriminative representation of objects with various geometries, leading to encouraging results in3D detection of pedestrians and cyclists.

Exploring Simple 3D Multi-Object Tracking for Autonomous Driving

SimTrack is presented to simplify the hand-crafted tracking paradigm by proposing an end-to-end trainable model for joint detection and tracking from raw point clouds and results reveal that the simple approach compares favorably with the state-of-the-art methods while ruling out the heuristic matching rules.

Deep Hough Voting for 3D Object Detection in Point Clouds

This work proposes VoteNet, an end-to-end 3D object detection network based on a synergy of deep point set networks and Hough voting that achieves state-of-the-art 3D detection on two large datasets of real 3D scans, ScanNet and SUN RGB-D with a simple design, compact model size and high efficiency.

PointPillars: Fast Encoders for Object Detection From Point Clouds

benchmarks suggest that PointPillars is an appropriate encoding for object detection in point clouds, and proposes a lean downstream network.