Vid2CAD: CAD Model Alignment using Multi-View Constraints from Videos
@article{Maninis2022Vid2CADCM, title={Vid2CAD: CAD Model Alignment using Multi-View Constraints from Videos}, author={Kevis-Kokitsi Maninis and Stefan Popov and Matthias Nie{\ss}ner and Vittorio Ferrari}, journal={IEEE transactions on pattern analysis and machine intelligence}, year={2022}, volume={PP} }
We address the task of aligning CAD models to a video sequence of a complex scene containing multiple objects. Our method can process arbitrary videos and fully automatically recover the 9 DoF pose for each object appearing in it, thus aligning them in a common 3D coordinate frame. The core idea of our method is to integrate neural network predictions from individual frames with a temporally global, multi-view constraint optimization formulation. This integration process resolves the scale and…
11 Citations
D3D-HOI: Dynamic 3D Human-Object Interactions from Videos
- Computer ScienceArXiv
- 2021
This work introduces D3D-HOI: a dataset of monocular videos with ground truth annotations of 3D object pose, shape and part motion during human-object interactions, demonstrating that human- object relations can significantly reduce the ambiguity of articulated object reconstructions from challenging real-world videos.
ELLIPSDF: Joint Object Pose and Shape Optimization with a Bi-level Ellipsoid and Signed Distance Function Description
- Computer Science2021 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2021
This paper proposes an expressive yet compact model for joint object pose and shape optimization, and an associated optimization algorithm to infer an object-level map from multi-view RGB-D camera observations.
Leveraging Geometry for Shape Estimation from a Single RGB Image
- Computer ScienceArXiv
- 2021
This work demonstrates how cross-domain keypoint matches from an RGB image to a rendered CAD model allow for more precise object pose predictions compared to ones obtained through direct predictions, and shows that by allowing object stretching the authors can modify retrieved CAD models to better fit the observed shapes.
ODAM: Object Detection, Association, and Mapping using Posed RGB Video
- Computer Science2021 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2021
ODAM, a system for 3D Object Detection, Association, and Mapping using posed RGB videos, relies on a deep learning front-end to detect 3D objects from a given RGB frame and associate them to a global object-based map using a graph neural network (GNN).
Point Scene Understanding via Disentangled Instance Mesh Reconstruction
- Computer ScienceArXiv
- 2022
This work proposes a DIMR framework that leverages a mesh-aware latent code space to disentangle the processes of shape completion and mesh generation, relieving the ambiguity caused by the incomplete point observations.
ROCA: Robust CAD Model Retrieval and Alignment from a Single Image
- Computer ScienceArXiv
- 2021
ROCA enables 3D perception of an observed scene from a 2D RGB observation, characterized as a lightweight, compact, clean CAD representation, based on dense 2D-3D object correspondences and Procrustes alignment.
Weakly-Supervised End-to-End CAD Retrieval to Scan Objects
- Computer Science, Environmental ScienceArXiv
- 2022
This work proposes a new weakly-supervised approach to retrieve semantically and structurally similar CAD models to a query 3D scanned scene without requiring any CAD-scan associations, and only object detection information as oriented bounding boxes.
Snap2cad: 3D indoor environment reconstruction for AR/VR applications using a smartphone device
- Computer Science
- 2021
RayTran: 3D pose estimation and shape reconstruction of multiple objects from videos with ray-traced transformers
- Computer ScienceArXiv
- 2022
A transformer-based neural network architecture for multi-object 3D reconstruction from RGB videos that is single stage, end-to-end trainable, and it can reason holistically about a scene from multiple video frames without needing a brittle tracking step.
CORSAIR: Convolutional Object Retrieval and Symmetry-AIded Registration
- Computer Science2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
- 2021
This model extends the Fully Convolutional Geo-metric Features model to learn a global object-shape embedding in addition to local point-wise features from the point-cloud observations, used to retrieve a similar object from a category database, and the local features are used for robust pose registration between the observed and the retrieved object.
References
SHOWING 1-10 OF 73 REFERENCES
3D-RCNN: Instance-Level 3D Object Reconstruction via Render-and-Compare
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
A differentiable Render-and-Compare loss is proposed that allows 3D shape and pose to be learned with 2D supervision and produces a compact 3D representation of the scene, which can be readily used for applications like autonomous driving.
FroDO: From Detections to 3D Objects
- Computer Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
FroDO is a method for accurate 3D reconstruction of object instances from RGB video that infers their location, pose and shape in a coarse to fine manner to embed object shapes in a novel learnt shape space that allows seamless switching between sparse point cloud and dense DeepSDF decoding.
SceneCAD: Predicting Object Alignments and Layouts in RGB-D Scans
- Computer ScienceECCV
- 2020
A message-passing graph neural network is proposed to model the inter-relationships between objects and layout, guiding generation of a globally object alignment in a scene by considering the global scene layout.
End-to-End CAD Model Retrieval and 9DoF Alignment in 3D Scans
- Computer Science2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019
We present a novel, end-to-end approach to align CAD models to an 3D scan of a scene, enabling transformation of a noisy, incomplete 3D scan to a compact, CAD reconstruction with clean, complete…
Scan2CAD: Learning CAD Model Alignment in RGB-D Scans
- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
This work designs a novel 3D CNN architecture that learns a joint embedding between real and synthetic objects, and from this predicts a correspondence heatmap, which forms a variational energy minimization that aligns a given set of CAD models to the reconstruction.
Scene Recomposition by Learning-Based ICP
- Computer Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
This work proposes a novel approach for aligning CAD models to 3D scans, based on deep reinforcement learning, which outperforms prior ICP methods in the literature and outperforms both learned local deep feature matching and geometric based alignment methods in real scenes.
Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve
- Computer ScienceECCV
- 2020
Mask2CAD is presented, which jointly detects objects in real-world images and for each detected object, optimizes for the most similar CAD model and its pose, and constructs a joint embedding space between the detected regions of an image corresponding to an object and 3D CAD models, enabling retrieval of CAD models for an input RGB image.
BundleFusion: real-time globally consistent 3D reconstruction using on-the-fly surface re-integration
- Computer ScienceTOGS
- 2017
This work systematically addresses issues with a novel, real-time, end-to-end reconstruction framework, which outperforms state-of-the-art online systems with quality on par to offline methods, but with unprecedented speed and scan completeness.
3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction
- Computer ScienceECCV
- 2016
The 3D-R2N2 reconstruction framework outperforms the state-of-the-art methods for single view reconstruction, and enables the 3D reconstruction of objects in situations when traditional SFM/SLAM methods fail (because of lack of texture and/or wide baseline).
Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes From a Single Image
- Computer Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
This paper proposes an end-to-end solution to jointly reconstruct room layout, object bounding boxes and meshes from a single image, and argues that understanding the context of each component can assist the task of parsing the others, which enables joint understanding and reconstruction.