DR-SPAAM: A Spatial-Attention and Auto-regressive Model for Person Detection in 2D Range Data

@article{Jia2020DRSPAAMAS,
  title={DR-SPAAM: A Spatial-Attention and Auto-regressive Model for Person Detection in 2D Range Data},
  author={Dan Jia and Alexander Hermans and B. Leibe},
  journal={2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  year={2020},
  pages={10270-10277}
}
Detecting persons using a 2D LiDAR is a challenging task due to the low information content of 2D range data. To alleviate the problem caused by the sparsity of the LiDAR points, current state-of-the-art methods fuse multiple previous scans and perform detection using the combined scans. The downside of such a backward looking fusion is that all the scans need to be aligned explicitly, and the necessary alignment operation makes the whole pipeline more expensive – often too expensive for real… 

Figures and Tables from this paper

Self-Supervised Person Detection in 2D Range Data using a Calibrated Camera

This work proposes a method, which uses bounding boxes from an image-based detector on a calibrated camera to automatically generate training labels for 2D LiDAR-based person detectors, and shows that self-supervised detectors, trained or fine-tuned with pseudolabels, outperform detectors trained only on a different dataset.

2D vs. 3D LiDAR-based Person Detection on Mobile Robots

Person detection is a crucial task for mobile robots navigating in human-populated environments. LiDAR sensors are promising for this task, thanks to their accurate depth measurements and large field

Domain and Modality Gaps for LiDAR-based Person Detection on Mobile Robots

A series of experiments are conducted, using the recently released JackRabbot dataset and the state-of-the-art detectors based on 3D or 2D LiDAR sensors (CenterPoint and DR-SPAAM respectively), to understand if detectors pretrained on driving datasets can achieve good performance on the mobile robot scenarios, for which there are currently no trained models readily available.

Cross-Modal Analysis of Human Detection for Robotics: An Industrial Case Study

A systematic cross-modal analysis of sensor-algorithm combinations typically used in robotics is conducted, comparing the performance of state-of-the-art person detectors for 2D range data, 3D lidar, and RGB-D data as well as selected combinations thereof in a challenging industrial use-case.

Robotic Vision for Human-Robot Interaction and Collaboration: A Survey and Systematic Review

It was found that robotic vision was often used in action and gesture recognition, robot movement in human spaces, object handover and collaborative actions, social communication and learning from demonstration.

Sensor fusion for functional safety of autonomous mobile robots in urban and industrial environments

This study reviews the state of the art sensors and pedestrian detection methods and shows the benefit of sensor fusion technologies based on artificial intelligence and the limitations of these methods for industrial outdoor and urban AMR safety applications and proposes methods to overcome these drawbacks.

Human-Centered Navigation and Person-Following with Omnidirectional Robot for Indoor Assistance and Monitoring

This paper presents a novel human-centered navigation system that successfully combines a real-time visual perception system with the mobility advantages provided by an omnidirectional robotic platform to precisely adjust the robot orientation and monitor a person while navigating.

Pedestrian-Robot Interactions on Autonomous Crowd Navigation: Reactive Control Methods and Evaluation Metrics

It is concluded that the reactive controller fulfils a necessary task of fast and continuous adaptation to crowd navigation, and it should be coupled with high-level planners for environmental and situational awareness.

Control of adaptive running platform based on machine vision technologies and neural networks

The scientific novelty of the study consists in the formalization and comparison of various control methods of adaptive running platforms, methods of positioning a person on them (using cameras and trackers), which will expand the area of knowledge about the optimal control functions of this class of devices.

References

SHOWING 1-10 OF 42 REFERENCES

SECOND: Sparsely Embedded Convolutional Detection

An improved sparse convolution method for Voxel-based 3D convolutional networks is investigated, which significantly increases the speed of both training and inference and introduces a new form of angle loss regression to improve the orientation estimation performance.

Multi-view 3D Object Detection Network for Autonomous Driving

This paper proposes Multi-View 3D networks (MV3D), a sensory-fusion framework that takes both LIDAR point cloud and RGB images as input and predicts oriented 3D bounding boxes and designs a deep fusion scheme to combine region-wise features from multiple views and enable interactions between intermediate layers of different paths.

Deep Person Detection in 2D Range Data

The deep learning based wheelchair and walker detector DROW is shown to be generalization to people, including small modifications that significantly boost DROW's performance, and the DROW dataset is extended with person annotations, making this the largest dataset of person annotations in 2D range data.

Joint 3D Proposal Generation and Object Detection from View Aggregation

This work presents AVOD, an Aggregate View Object Detection network for autonomous driving scenarios that uses LIDAR point clouds and RGB images to generate features that are shared by two subnetworks: a region proposal network (RPN) and a second stage detector network.

STD: Sparse-to-Dense 3D Object Detector for Point Cloud

This work proposes a two-stage 3D object detection framework, named sparse-to-dense 3D Object Detector (STD), and implements a parallel intersection-over-union (IoU) branch to increase awareness of localization accuracy, resulting in further improved performance.

Frustum PointNets for 3D Object Detection from RGB-D Data

This work directly operates on raw point clouds by popping up RGBD scans and leverages both mature 2D object detectors and advanced 3D deep learning for object localization, achieving efficiency as well as high recall for even small objects.

4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks

This work creates an open-source auto-differentiation library for sparse tensors that provides extensive functions for high-dimensional convolutional neural networks and proposes the hybrid kernel, a special case of the generalized sparse convolution, and trilateral-stationary conditional random fields that enforce spatio-temporal consistency in the 7D space-time-chroma space.

3D Semantic Segmentation with Submanifold Sparse Convolutional Networks

This work introduces new sparse convolutional operations that are designed to process spatially-sparse data more efficiently, and uses them to develop Spatially-Sparse Convolutional networks, which outperform all prior state-of-the-art models on two tasks involving semantic segmentation of 3D point clouds.

Deep Hough Voting for 3D Object Detection in Point Clouds

This work proposes VoteNet, an end-to-end 3D object detection network based on a synergy of deep point set networks and Hough voting that achieves state-of-the-art 3D detection on two large datasets of real 3D scans, ScanNet and SUN RGB-D with a simple design, compact model size and high efficiency.

PointPillars: Fast Encoders for Object Detection From Point Clouds

benchmarks suggest that PointPillars is an appropriate encoding for object detection in point clouds, and proposes a lean downstream network.