Unsupervised Object Detection with LiDAR Clues

  title={Unsupervised Object Detection with LiDAR Clues},
  author={Haofei Tian and Yuntao Chen and Jifeng Dai and Zhaoxiang Zhang and Xizhou Zhu},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  • Haofei Tian, Yuntao Chen, Xizhou Zhu
  • Published 25 November 2020
  • Computer Science, Environmental Science
  • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Despite the importance of unsupervised object detection, to the best of our knowledge, there is no previous work addressing this problem. One main issue, widely known to the community, is that object boundaries derived only from 2D image appearance are ambiguous and unreliable. To address this, we exploit LiDAR clues to aid unsupervised object detection. By exploiting the 3D scene structure, the issue of localization can be considerably mitigated. We further identify another major issue, seldom… 

Figures and Tables from this paper

Localizing Objects with Self-Supervised Transformers and no Labels
The proposed method, LOST, does not require any external object proposal nor any exploration of the image collection; it operates on a single image and outperform state-of-the-art object discovery methods by up to 8 CorLoc points on PASCAL VOC 2012.
Drive&Segment: Unsupervised Semantic Segmentation of Urban Scenes via Cross-modal Distillation
This work proposes a novel method for cross-modal unsupervised learning of semantic image segmentation by leveraging synchronized LiDAR and image data, and develops a crossmodal distillation approach that leverages image data partially annotated with the resulting pseudo-classes to train a transformer-based model for image semantic segmentation.
Learning to Detect Mobile Objects from LiDAR Scans Without Labels
This paper proposes an alternative approach entirely based on unlabeled data, which can be collected cheaply and in abundance almost everywhere on earth, and demonstrates that these seed labels are highly effective to bootstrap a surprisingly accurate detector through repeated self-training without a single human annotated label.
Class-aware Sounding Objects Localization via Audiovisual Correspondence
A two-stage step-by-step learning framework to localize and recognize sounding objects in complex audiovisual scenarios using only the correspondence between audio and vision, which is superior in localizing and recognizing objects as well as filtering out silent ones.
Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals
A two-step framework that adopts a predetermined mid-level prior in a contrastive optimization objective to learn pixel embeddings and argues about the importance of having a prior that contains information about objects, or their parts, and discusses several possibilities to obtain such a prior in an unsupervised manner.


RoarNet: A Robust 3D Object Detection based on RegiOn Approximation Refinement
RoarNet outperforms state-of-the-art methods even in settings where Lidar and camera are not time synchronized, which is practically important for actual driving environment.
Fast Lidar Clustering by Density and Connectivity
This work proposes an algorithmic approach for real-time instance segmentation of Lidar sensor data that leverages the properties of the Euclidean distance to retain three-dimensional measurement information, while being narrowed down to a two-dimensional representation for fast computation.
Consistency-based Semi-supervised Learning for Object detection
A Consistency-based Semi-supervised learning method for object Detection (CSD), which is a way of using consistency constraints as a tool for enhancing detection performance by making full use of available unlabeled data.
Missing Labels in Object Detection
This paper study the effect of missing annotations on FSOD methods and analyze approaches to train an object detector from a hybrid dataset, where both instancelevel and image-level labels are employed, to demonstrate the effectiveness of the method.
Joint 3D Proposal Generation and Object Detection from View Aggregation
This work presents AVOD, an Aggregate View Object Detection network for autonomous driving scenarios that uses LIDAR point clouds and RGB images to generate features that are shared by two subnetworks: a region proposal network (RPN) and a second stage detector network.
PointPillars: Fast Encoders for Object Detection From Point Clouds
benchmarks suggest that PointPillars is an appropriate encoding for object detection in point clouds, and proposes a lean downstream network.
BirdNet: A 3D Object Detection Framework from LiDAR Information
A LiDAR-based 3D object detection pipeline entailing three stages, where laser information is projected into a novel cell encoding for bird's eye view projection, and both object location on the plane and its heading are estimated through a convolutional neural network originally designed for image processing.
STD: Sparse-to-Dense 3D Object Detector for Point Cloud
This work proposes a two-stage 3D object detection framework, named sparse-to-dense 3D Object Detector (STD), and implements a parallel intersection-over-union (IoU) branch to increase awareness of localization accuracy, resulting in further improved performance.
Efficient Online Segmentation for Sparse 3D Laser Scans
An effective method that first removes the ground from the scan and then segments the 3D data in a range image representation into different objects and can operate at frame rates that are substantially higher than those of the sensors while using only a single core of a mobile CPU and producing high-quality segmentation results.
Frustum PointNets for 3D Object Detection from RGB-D Data
This work directly operates on raw point clouds by popping up RGBD scans and leverages both mature 2D object detectors and advanced 3D deep learning for object localization, achieving efficiency as well as high recall for even small objects.