ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings

@article{Huang2020ClusterVOCM,
  title={ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings},
  author={Jiahui Huang and Sheng Yang and Tai-Jiang Mu and Shimin Hu},
  journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2020},
  pages={2165-2174}
}
  • Jiahui Huang, Sheng Yang, Shimin Hu
  • Published 29 March 2020
  • Computer Science
  • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
We present ClusterVO, a stereo Visual Odometry which simultaneously clusters and estimates the motion of both ego and surrounding rigid clusters/objects. Unlike previous solutions relying on batch input or imposing priors on scene structure or dynamic object models, ClusterVO is online, general and thus can be used in various scenarios including indoor scene understanding and autonomous driving. At the core of our system lies a multi-level probabilistic association mechanism and a heterogeneous… 

Figures and Tables from this paper

DynaSLAM II: Tightly-Coupled Multi-Object Tracking and SLAM
TLDR
DynaSLAM II is presented, a visual SLAM system for stereo and RGB-D camera configurations that tightly integrates the multi-object tracking capability and makes use of instance semantic segmentation and ORB features to track dynamic objects.
Multimotion Visual Odometry (MVO)
TLDR
Multimotion Visual Odometry (MVO), a multimotion estimation pipeline that estimates the full SE (3) trajectory of every motion in the scene, including the sensor egomotion, without relying on appearance-based information, is presented.
S3LAM: Structured Scene SLAM
TLDR
This work proposes a new SLAM system based on ORB-SLAM2 that creates a semantic map made of clusters of points corresponding to objects instances and structures in the scene that improves both camera localization and reconstruction and enables a better understanding of the scene.
Incorporating Large Vocabulary Object Detection and Tracking into Visual SLAM
TLDR
This master’s thesis presents a dynamic visual simultaneous localization and mapping (SLAM) system based on a neural network for semantic tracking of 2D bounding boxes and bundle adjustment (BA) optimization of geometric keypoints and shows qualitative results of the object tracking of other object classes than cars and pedestrians.
DOT: Dynamic Object Tracking for Visual SLAM
TLDR
DOT (Dynamic Object Tracking) combines instance segmentation and multi-view geometry to generate masks for dynamic objects in order to allow SLAM systems based on rigid scene models to avoid such image areas in their optimizations.
View Birdification in the Crowd: Ground-Plane Localization from Perceived Movements
TLDR
The method first estimates the observer’s movement and then localizes surrounding pedestrians for each frame while taking into account the local interactions between them, and derives a cascaded optimization method from a Bayesian perspective.
Semantics Aware Dynamic SLAM Based on 3D MODT
TLDR
The results suggest that the proposed dynamic SLAM framework can perform in real-time with budgeted computational resources and the fused MODT provides rich semantic information that can be readily integrated into SLAM.
Accurate Dynamic SLAM Using CRF-Based Long-Term Consistency
TLDR
This article presents a novel RGB-D SLAM approach for accurate camera pose tracking in dynamic environments, providing a more accurate dynamic 3D landmark detection method, followed by the use of long-term consistency via conditional random fields, which leverages long- term observations from multiple frames.
A Switching-Coupled Backend for Simultaneous Localization and Dynamic Object Tracking
TLDR
This work proposes a novel switching-coupled back-end solution and theoretically derive its concrete form using probability representation based on the switching strategy and the proposed objects classification criteria, where the object uncertainty, observation quality and prior information are jointly considered.
Multiway Non-rigid Point Cloud Registration via Learned Functional Map Synchronization.
TLDR
SyNoRiM, a novel way to jointly register multiple non-rigid shapes by synchronizing the maps that relate learned functions defined on the point clouds, achieves a state-of-the-art performance in registration accuracy, while being flexible and efficient as it avoids the costly optimization over point-wise permutations by the use of basis function maps.
...
1
2
3
...

References

SHOWING 1-10 OF 53 REFERENCES
ClusterSLAM: A SLAM Backend for Simultaneous Rigid Body Clustering and Motion Estimation
TLDR
Evaluations on both synthetic scenes and KITTI demonstrate the capability of the approach, and further experiments considering online efficiency also show the effectiveness of the method for simultaneous tracking of ego-motion and multiple objects.
Robust Dense Mapping for Large-Scale Dynamic Environments
TLDR
A stereo-based dense mapping algorithm for large-scale dynamic urban environments that simultaneously reconstruct the static background, the moving objects, and the potentially moving but currently stationary objects separately, which is desirable for high-level mobile robotic tasks such as path planning in crowded environments.
Occlusion-Robust MVO: Multimotion Estimation Through Occlusion Via Motion Closure
  • Kevin M. Judd, J. Gammell
  • Computer Science
    2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
  • 2020
TLDR
This paper presents a pipeline for estimating multiple motions, including the camera egomotion, in the presence of occlusions, and uses an expressive motion prior to estimate the SE(3) trajectory of every motion in the scene, even during temporary occlusion.
EM-Fusion: Dynamic Object-Level SLAM With Probabilistic Data Association
TLDR
This paper proposes a novel approach to dynamic SLAM with dense object-level representations, which represents rigid objects in local volumetric signed distance function (SDF) maps, and formulate multi-object tracking as direct alignment of RGB-D images with the SDF representations.
Multimotion Visual Odometry (MVO): Simultaneous Estimation of Camera and Third-Party Motions
TLDR
The traditional visual odometry pipeline is extended to estimate the full motion of both a stereo/RGB-D camera and the dynamic scene, and its performance is evaluated on a real-world dynamic dataset with ground truth for all motions from a motion capture system.
MID-Fusion: Octree-based Object-Level Multi-Instance Dynamic SLAM
TLDR
This system is the first system to generate an object-level dynamic volumetric map from a single RGB-D camera, which can be used directly for robotic tasks and demonstrates its effectiveness by quantitatively and qualitatively testing it on both synthetic and real-world sequences.
Object scene flow for autonomous vehicles
TLDR
A novel model and dataset for 3D scene flow estimation with an application to autonomous driving by representing each element in the scene by its rigid motion parameters and each superpixel by a 3D plane as well as an index to the corresponding object.
Stereo Vision-based Semantic 3D Object and Ego-motion Tracking for Autonomous Driving
TLDR
Based on the object-aware-aided camera pose tracking which is robust in dynamic environments, in combination with the novel dynamic object bundle adjustment (BA) approach to fuse temporal sparse feature correspondences and the semantic 3D measurement model, 3D object pose, velocity and anchored dynamic point cloud estimation are obtained with instance accuracy and temporal consistency.
DynaSLAM: Tracking, Mapping, and Inpainting in Dynamic Scenes
TLDR
DynaSLAM is a visual SLAM system that, building on ORB-SLAM2, adds the capabilities of dynamic object detection and background inpainting, and outperforms the accuracy of standard visualSLAM baselines in highly dynamic scenarios.
Beyond Pixels: Leveraging Geometry and Shape Cues for Online Multi-Object Tracking
This paper introduces geometry and object shape and pose costs for multi-object tracking in urban driving scenarios. Using images from a monocular camera alone, we devise pairwise costs for object
...
1
2
3
4
5
...