Patch2CAD: Patchwise Embedding Learning for In-the-Wild Shape Retrieval from a Single Image

  title={Patch2CAD: Patchwise Embedding Learning for In-the-Wild Shape Retrieval from a Single Image},
  author={Weicheng Kuo and Anelia Angelova and Tsung-Yi Lin and Angela Dai},
  journal={2021 IEEE/CVF International Conference on Computer Vision (ICCV)},
3D perception of object shapes from RGB image input is fundamental towards semantic scene understanding, grounding image-based perception in our spatially 3dimensional real-world environments. To achieve a mapping between image views of objects and 3D shapes, we leverage CAD model priors from existing large-scale databases, and propose a novel approach towards constructing a joint embedding space between 2D images and 3D CAD models in a patch-wise fashion – establishing correspondences between… 

Leveraging Geometry for Shape Estimation from a Single RGB Image

This work demonstrates how cross-domain keypoint matches from an RGB image to a rendered CAD model allow for more precise object pose predictions compared to ones obtained through direct predictions, and shows that by allowing object stretching the authors can modify retrieved CAD models to better fit the observed shapes.

ROCA: Robust CAD Model Retrieval and Alignment from a Single Image

ROCA can provide a robust CAD alignment while simultaneously informing CAD retrieval by leveraging the 2D-3D correspondences to learn geometrically similar CAD models, and achieves the best retrieval-aware alignment performance.

Weakly-Supervised End-to-End CAD Retrieval to Scan Objects

This work proposes a new weakly-supervised approach to retrieve semantically and structurally similar CAD models to a query 3D scanned scene without requiring any CAD-scan associations, and only object detection information as oriented bounding boxes.

PatchRD: Detail-Preserving Shape Completion by Learning Patch Retrieval and Deformation

A data-driven shape completion approach that focuses on completing geometric details of missing regions of 3D shapes by copy and deform patches from the partial input to complete missing regions, to preserve the style of local geometric features.

RayTran: 3D pose estimation and shape reconstruction of multiple objects from videos with ray-traced transformers

A transformer-based neural network architecture for multi-object 3D reconstruction from RGB videos that is single stage, end-to-end trainable, and it can reason holistically about a scene from multiple video frames without needing a brittle tracking step.

Pose2Room: Understanding 3D Scenes from Human Activities

P2R-Net is proposed to learn a probabilistic 3D model of the objects in a scene characterized by their class categories and oriented 3D bounding boxes, based on an input observed human trajectory in the environment, and consistently outperforms the baselines on the PROX dataset and the VirtualHome platform.



Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve

Mask2CAD is presented, which jointly detects objects in real-world images and for each detected object, optimizes for the most similar CAD model and its pose, and constructs a joint embedding space between the detected regions of an image corresponding to an object and 3D CAD models, enabling retrieval of CAD models for an input RGB image.

Location Field Descriptors: Single Image 3D Model Retrieval in the Wild

This work presents Location Field Descriptors, a novel approach for single image 3D model retrieval in the wild that significantly outperform the state-of-the-art by up to 20% absolute in multiple 3D retrieval metrics.

Learning Local RGB-to-CAD Correspondences for Object Pose Estimation

This paper solves the key problem of existing methods requiring expensive 3D pose annotations by proposing a new method that matches RGB images to CAD models for object pose estimation and can reliably estimate object pose in RGB images and generalize to object instances not seen during training.

Joint embeddings of shapes and images via CNN image purification

A joint embedding space populated by both 3D shapes and 2D images of objects, where the distances between embedded entities reflect similarity between the underlying objects, which facilitates comparison between entities of either form, and allows for cross-modality retrieval.

Scan2CAD: Learning CAD Model Alignment in RGB-D Scans

This work designs a novel 3D CNN architecture that learns a joint embedding between real and synthetic objects, and from this predicts a correspondence heatmap, which forms a variational energy minimization that aligns a given set of CAD models to the reconstruction.

Scene Recomposition by Learning-Based ICP

  • Hamid IzadiniaS. Seitz
  • Computer Science
    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
This work proposes a novel approach for aligning CAD models to 3D scans, based on deep reinforcement learning, which outperforms prior ICP methods in the literature and outperforms both learned local deep feature matching and geometric based alignment methods in real scenes.

ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors

ShapeMask is introduced, which learns the intermediate concept of object shape to address the problem of generalization in instance segmentation to novel categories and significantly outperforms the state-of-the-art when learning across categories.

SceneCAD: Predicting Object Alignments and Layouts in RGB-D Scans

A message-passing graph neural network is proposed to model the inter-relationships between objects and layout, guiding generation of a globally object alignment in a scene by considering the global scene layout.

3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction

The 3D-R2N2 reconstruction framework outperforms the state-of-the-art methods for single view reconstruction, and enables the 3D reconstruction of objects in situations when traditional SFM/SLAM methods fail (because of lack of texture and/or wide baseline).

Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling

A novel model is designed that simultaneously performs 3D reconstruction and pose estimation; this multi-task learning approach achieves state-of-the-art performance on both tasks.