Vid2CAD: CAD Model Alignment using Multi-View Constraints from Videos

@article{Maninis2022Vid2CADCM,
  title={Vid2CAD: CAD Model Alignment using Multi-View Constraints from Videos},
  author={Kevis-Kokitsi Maninis and Stefan Popov and Matthias Nie{\ss}ner and Vittorio Ferrari},
  journal={IEEE transactions on pattern analysis and machine intelligence},
  year={2022},
  volume={PP}
}
  • K. Maninis, S. Popov, V. Ferrari
  • Published 8 December 2020
  • Computer Science
  • IEEE transactions on pattern analysis and machine intelligence
We address the task of aligning CAD models to a video sequence of a complex scene containing multiple objects. Our method can process arbitrary videos and fully automatically recover the 9 DoF pose for each object appearing in it, thus aligning them in a common 3D coordinate frame. The core idea of our method is to integrate neural network predictions from individual frames with a temporally global, multi-view constraint optimization formulation. This integration process resolves the scale and… 

Figures and Tables from this paper

D3D-HOI: Dynamic 3D Human-Object Interactions from Videos
TLDR
This work introduces D3D-HOI: a dataset of monocular videos with ground truth annotations of 3D object pose, shape and part motion during human-object interactions, demonstrating that human- object relations can significantly reduce the ambiguity of articulated object reconstructions from challenging real-world videos.
ELLIPSDF: Joint Object Pose and Shape Optimization with a Bi-level Ellipsoid and Signed Distance Function Description
TLDR
This paper proposes an expressive yet compact model for joint object pose and shape optimization, and an associated optimization algorithm to infer an object-level map from multi-view RGB-D camera observations.
Leveraging Geometry for Shape Estimation from a Single RGB Image
TLDR
This work demonstrates how cross-domain keypoint matches from an RGB image to a rendered CAD model allow for more precise object pose predictions compared to ones obtained through direct predictions, and shows that by allowing object stretching the authors can modify retrieved CAD models to better fit the observed shapes.
ODAM: Object Detection, Association, and Mapping using Posed RGB Video
TLDR
ODAM, a system for 3D Object Detection, Association, and Mapping using posed RGB videos, relies on a deep learning front-end to detect 3D objects from a given RGB frame and associate them to a global object-based map using a graph neural network (GNN).
Point Scene Understanding via Disentangled Instance Mesh Reconstruction
TLDR
This work proposes a DIMR framework that leverages a mesh-aware latent code space to disentangle the processes of shape completion and mesh generation, relieving the ambiguity caused by the incomplete point observations.
ROCA: Robust CAD Model Retrieval and Alignment from a Single Image
TLDR
ROCA enables 3D perception of an observed scene from a 2D RGB observation, characterized as a lightweight, compact, clean CAD representation, based on dense 2D-3D object correspondences and Procrustes alignment.
Weakly-Supervised End-to-End CAD Retrieval to Scan Objects
TLDR
This work proposes a new weakly-supervised approach to retrieve semantically and structurally similar CAD models to a query 3D scanned scene without requiring any CAD-scan associations, and only object detection information as oriented bounding boxes.
RayTran: 3D pose estimation and shape reconstruction of multiple objects from videos with ray-traced transformers
TLDR
A transformer-based neural network architecture for multi-object 3D reconstruction from RGB videos that is single stage, end-to-end trainable, and it can reason holistically about a scene from multiple video frames without needing a brittle tracking step.
CORSAIR: Convolutional Object Retrieval and Symmetry-AIded Registration
TLDR
This model extends the Fully Convolutional Geo-metric Features model to learn a global object-shape embedding in addition to local point-wise features from the point-cloud observations, used to retrieve a similar object from a category database, and the local features are used for robust pose registration between the observed and the retrieved object.
...
1
2
...

References

SHOWING 1-10 OF 73 REFERENCES
3D-RCNN: Instance-Level 3D Object Reconstruction via Render-and-Compare
TLDR
A differentiable Render-and-Compare loss is proposed that allows 3D shape and pose to be learned with 2D supervision and produces a compact 3D representation of the scene, which can be readily used for applications like autonomous driving.
FroDO: From Detections to 3D Objects
TLDR
FroDO is a method for accurate 3D reconstruction of object instances from RGB video that infers their location, pose and shape in a coarse to fine manner to embed object shapes in a novel learnt shape space that allows seamless switching between sparse point cloud and dense DeepSDF decoding.
SceneCAD: Predicting Object Alignments and Layouts in RGB-D Scans
TLDR
A message-passing graph neural network is proposed to model the inter-relationships between objects and layout, guiding generation of a globally object alignment in a scene by considering the global scene layout.
End-to-End CAD Model Retrieval and 9DoF Alignment in 3D Scans
We present a novel, end-to-end approach to align CAD models to an 3D scan of a scene, enabling transformation of a noisy, incomplete 3D scan to a compact, CAD reconstruction with clean, complete
Scan2CAD: Learning CAD Model Alignment in RGB-D Scans
TLDR
This work designs a novel 3D CNN architecture that learns a joint embedding between real and synthetic objects, and from this predicts a correspondence heatmap, which forms a variational energy minimization that aligns a given set of CAD models to the reconstruction.
Scene Recomposition by Learning-Based ICP
  • Hamid Izadinia, S. Seitz
  • Computer Science
    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
TLDR
This work proposes a novel approach for aligning CAD models to 3D scans, based on deep reinforcement learning, which outperforms prior ICP methods in the literature and outperforms both learned local deep feature matching and geometric based alignment methods in real scenes.
Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve
TLDR
Mask2CAD is presented, which jointly detects objects in real-world images and for each detected object, optimizes for the most similar CAD model and its pose, and constructs a joint embedding space between the detected regions of an image corresponding to an object and 3D CAD models, enabling retrieval of CAD models for an input RGB image.
BundleFusion: real-time globally consistent 3D reconstruction using on-the-fly surface re-integration
TLDR
This work systematically addresses issues with a novel, real-time, end-to-end reconstruction framework, which outperforms state-of-the-art online systems with quality on par to offline methods, but with unprecedented speed and scan completeness.
3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction
TLDR
The 3D-R2N2 reconstruction framework outperforms the state-of-the-art methods for single view reconstruction, and enables the 3D reconstruction of objects in situations when traditional SFM/SLAM methods fail (because of lack of texture and/or wide baseline).
Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes From a Single Image
TLDR
This paper proposes an end-to-end solution to jointly reconstruct room layout, object bounding boxes and meshes from a single image, and argues that understanding the context of each component can assist the task of parsing the others, which enables joint understanding and reconstruction.
...
1
2
3
4
5
...