Localization and Mapping using Instance-specific Mesh Models

  title={Localization and Mapping using Instance-specific Mesh Models},
  author={Qiaojun Feng and Yue Meng and Mo Shan and Nikolay A. Atanasov},
  journal={2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
This paper focuses on building semantic maps, containing object poses and shapes, using a monocular camera. This is an important problem because robots need rich understanding of geometry and context if they are to shape the future of transportation, construction, and agriculture. Our contribution is an instance-specific mesh model of object shape that can be optimized online based on semantic information extracted from camera images. Multi-view constraints on the object shape are obtained by… 

Figures and Tables from this paper

Large-Scale 3D Semantic Reconstruction for Automated Driving Vehicles with Adaptive Truncated Signed Distance Function

A novel 3D reconstruction, texturing and semantic mapping system using LiDAR and camera sensors using an Adaptive Truncated Signed Distance Function and a Markov Random Field-based data fusion approach to estimate the optimal semantic class for each triangle mesh.

TerrainMesh: Metric-Semantic Terrain Reconstruction from Aerial Images Using Joint 2D-3D Learning

Quantitative and qualitative evaluation using real aerial images show the potential of the joint 2D-3D learning approach to reconstruct a local metric-semantic mesh at each camera keyframe maintained by a visual odometry algorithm to support environmental monitoring and surveillance applications.

Lightweight Semantic Mesh Mapping for Autonomous Vehicles

This paper introduces a probabilistic fusion scheme to incrementally refine and extend a 3D mesh with semantic labels for each face without intermediate voxel-based fusion and shows that the proposed approach achieves reconstruction quality comparable to current state-of-the-art voxels-based methods while being much more lightweight both in storage and computation.

Online Adaptation for Implicit Object Tracking and Shape Reconstruction in the Wild

A novel and unified framework which utilizes a neural implicit function to simultaneously track and reconstruct 3D objects in the wild and experiment with both Waymo and KITTI datasets and shows significant improvements over state-of-the-art methods for both tracking and shape reconstruction tasks.

OrcVIO: Object residual constrained Visual-Inertial Odometry

This work presents OrcVIO, for visual-inertial odometry tightly coupled with tracking and optimization over structured object models, through semantic feature and bounding-box reprojection errors to perform batch optimization over the pose and shape of objects.

Vision Only 3-D Shape Estimation for Autonomous Driving

A probabilistic framework for detailed 3-D shape estimation and tracking using only vision measurements, using a bird’s eye view view representation, and the ability of the approach to produce more accurate and cleaner shape estimates is demonstrated.



Learning Category-Specific Mesh Reconstruction from Image Collections

A learning framework for recovering the 3D shape, camera, and texture of an object from a single image by incorporating texture inference as prediction of an image in a canonical appearance space and shows that semantic keypoints can be easily associated with the predicted shapes.

Learning Category-Specific Deformable 3D Models for Object Reconstruction

This work addresses the problem of fully automatic object localization and reconstruction from a single image and introduces a complementary network for the task of camera viewpoint prediction and captures top-down information about the main global modes of shape variation within a class providing a “low-frequency” shape.

6-DoF object pose from semantic keypoints

A novel approach to estimating the continuous six degree of freedom (6-DoF) pose (3D translation and rotation) of an object from a single RGB image by combining semantic keypoints predicted by a convolutional network with a deformable shape model.

Towards semantic SLAM using a monocular camera

This paper proposes a semantic SLAM algorithm that merges in the estimated map traditional meaningless points with known objects and builds a non-annotated map using only the information extracted from a monocular image sequence.

Visual-Inertial Object Detection and Mapping

We present a method to populate an unknown environment with models of previously seen objects, placed in a Euclidean reference frame that is inferred causally and on-line using monocular video along

Stereo Vision-based Semantic 3D Object and Ego-motion Tracking for Autonomous Driving

Based on the object-aware-aided camera pose tracking which is robust in dynamic environments, in combination with the novel dynamic object bundle adjustment (BA) approach to fuse temporal sparse feature correspondences and the semantic 3D measurement model, 3D object pose, velocity and anchored dynamic point cloud estimation are obtained with instance accuracy and temporal consistency.

SLAM++: Simultaneous Localisation and Mapping at the Level of Objects

The object graph enables predictions for accurate ICP-based camera to model tracking at each live frame, and efficient active search for new objects in currently undescribed image regions, as well as the generation of an object level scene description with the potential to enable interaction.

QuadricSLAM: Constrained Dual Quadrics from Object Detections as Landmarks in Semantic SLAM

A sensor model for deep-learned object detectors that addresses the challenge of partial object detections often encountered in robotics applications is developed, and how to jointly estimate the camera pose and constrained dual quadric parameters in factor graph based SLAM with a general perspective camera is demonstrated.

Localization from semantic observations via the matrix permanent

This paper uses object recognition to obtain semantic information from the robot’s sensors and considers the task of localizing the robot within a prior map of landmarks, which are annotated with semantic labels.

Beyond PASCAL: A benchmark for 3D object detection in the wild

PASCAL3D+ dataset is contributed, which is a novel and challenging dataset for 3D object detection and pose estimation, and on average there are more than 3,000 object instances per category.