ConDor: Self-Supervised Canonicalization of 3D Pose for Partial Shapes

  title={ConDor: Self-Supervised Canonicalization of 3D Pose for Partial Shapes},
  author={Rahul Sajnani and Adrien Poulenard and Jivitesh Jain and Radhika Dua and Leonidas J. Guibas and Srinath Sridhar},
  journal={2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
Progress in 3D object understanding has relied on manually “canonicalized” shape datasets that contain instances with consistent position and orientation (3D pose). This has made it hard to generalize these methods to in-the-wild shapes, e.g., from internet model collections or depth sensors. ConDor is a self-supervised method that learns to Canonicalize the 3D orientation and position for full and partial 3D point clouds. We build on top of Tensor Field Networks (TFNs), a class of permutation… 

Canonical Fields: Self-Supervised Learning of Pose-Canonicalized Neural Fields

Canonical Field Network ( CaFi-Net) is presented, a self-supervised method to canonicalize the 3D pose of instances from an object category represented as neuralfields, specif-ically neural radiance fiElds (NeRFs), using a Siamese network architecture that is designed to extract equivariant features for category-level canonicalization.

SCARP: 3D Shape Completion in ARbitrary Poses for Improved Grasping

SCARP is a model that performs canonicalization, pose estimation, and shape completion in a single network, improving the performance by 45% over the existing baselines and is used for improving grasp proposals on tabletop objects.

CMD-Net: Self-Supervised Category-Level 3D Shape Denoising through Canonicalization

This paper presents a self-supervised learning-based method, Canonical Mapping and Denoising Network (CMD-Net), and address category-level 3D shape denoising through canonicalization, capable of canonicalizing noise-corrupted clouds under arbitrary rotations.

DRACO: Weakly Supervised Dense Reconstruction And Canonicalization of Objects

DRACO performs dense canonicalization using only weak supervision in the form of camera poses and semantic keypoints at train time, solely using one or more RGB images of an object.

Zero-Shot Category-Level Object Pose Estimation

This paper proposes a novel method based on semantic correspondences from a self-supervised vision transformer to solve the pose estimation problem, and extends much of the existing literature by removing the need for pose-labelled datasets or category-specific CAD models for training or inference.

Shape-Pose Disentanglement using SE(3)-equivariant Vector Neurons

An unsupervised technique for encoding point clouds into a canonical shape representation, by disentangling shape and pose is introduced, enabling the approach to focus on learning a consistent canonical pose for a class of objects.

Vitruvio: 3D Building Meshes via Single Perspective Sketches

Vitruvio outputs a 3D-printable building mesh with arbi-trary topology and genus from a single perspective sketch, providing a step forward to allow owners and designers to communicate 3D information via a 2D, effective, intuitive, and universal communication medium: the sketch.

Category-Level Global Camera Pose Estimation with Multi-Hypothesis Point Cloud Correspondences

An optimization method that retains all possible correspondences for each keypoint when matching a partial point cloud to a complete point cloud and gradually updated with the estimated rigid transformation by considering the matching cost is proposed.

A Simple Strategy to Provable Invariance via Orbit Mapping

This work proposes a method to make network architectures provably invariant with respect to group actions by choosing one element from a (possibly continuous) orbit based on a fixed criterion.

Neural Fields in Visual Computing

A review of the literature on neural fields shows the breadth of topics already covered in visual computing, both historically and in current incarnations, and highlights the improved quality, flexibility, and capability brought by neural field methods.



C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion

This work proposes C3DPO, a method for extracting 3D models of deformable objects from 2D keypoint annotations in unconstrained images by learning a deep network that reconstructs a 3D object from a single view at a time, and introduces a novel regularization technique.

Weakly-supervised 3D Shape Completion in the Wild

Experiments show that it is feasible and promising to learn 3D shape completion through large-scale data without shape and pose supervision, and jointly optimizes canonical shapes and poses with multi-view geometry constraints during training.

Learning to Orient Surfaces by Self-supervised Spherical CNNs

This work realizes the first end-to-end learning approach to define and extract the canonical orientation of 3D shapes, which it is realized to be a robust canonical orientation for surfaces represented as point clouds.

DRACO: Weakly Supervised Dense Reconstruction And Canonicalization of Objects

DRACO performs dense canonicalization using only weak supervision in the form of camera poses and semantic keypoints at train time, solely using one or more RGB images of an object.

3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction

The 3D-R2N2 reconstruction framework outperforms the state-of-the-art methods for single view reconstruction, and enables the 3D reconstruction of objects in situations when traditional SFM/SLAM methods fail (because of lack of texture and/or wide baseline).

3D ShapeNets: A deep representation for volumetric shapes

This work proposes to represent a geometric 3D shape as a probability distribution of binary variables on a 3D voxel grid, using a Convolutional Deep Belief Network, and shows that this 3D deep representation enables significant performance improvement over the-state-of-the-arts in a variety of tasks.

Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction

A large-scale dataset, called Common Objects in 3D, with real multi-view images of object categories annotated with camera poses and ground truth 3D point clouds, and a novel neural rendering method that leverages the powerful Transformer to reconstruct an object given a small number of its views is contributed.

Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation

The proposed method is able to robustly estimate the pose and size of unseen object instances in real environments while also achieving state-of-the-art performance on standard 6D pose estimation benchmarks.

A functional approach to rotation equivariant non-linearities for Tensor Field Networks

  • A. PoulenardL. Guibas
  • Computer Science
    2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2021
This work transposes the idea of Hann et al. to 3D by interpreting TFN features as spherical harmonics coefficients of functions on the sphere and introduces a new equivariant nonlinearity and pooling for TFN.

PointFlow: 3D Point Cloud Generation With Continuous Normalizing Flows

A principled probabilistic framework to generate 3D point clouds by modeling them as a distribution of distributions with the invertibility of normalizing flows enables the computation of the likelihood during training and allows the model to train in the variational inference framework.