Articulation-Aware Canonical Surface Mapping

@article{Kulkarni2020ArticulationAwareCS,
  title={Articulation-Aware Canonical Surface Mapping},
  author={Nilesh Kulkarni and Abhinav Kumar Gupta and David F. Fouhey and Shubham Tulsiani},
  journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2020},
  pages={449-458}
}
We tackle the tasks of: 1) predicting a Canonical Surface Mapping (CSM) that indicates the mapping from 2D pixels to corresponding points on a canonical template shape , and 2) inferring the articulation and pose of the template corresponding to the input image. While previous approaches rely on keypoint supervision for learning, we present an approach that can learn without such annotations. Our key insight is that these tasks are geometrically related, and we can obtain supervisory signal via… Expand
Implicit Mesh Reconstruction from Unannotated Image Collections
TLDR
An approach to infer the 3D shape, texture, and camera pose for an object from a single RGB image, using only category-level image collections with foreground masks as supervision, and qualitatively demonstrates its applicability over a set of about 30 object categories. Expand
Shelf-Supervised Mesh Prediction in the Wild
TLDR
This work proposes a learning-based approach that can train from unstructured image collections, supervised by only segmentation outputs from off-the-shelf recognition systems (i.e. ‘shelf-supervised’). Expand
Canonical 3D Deformer Maps: Unifying parametric and non-parametric methods for dense weakly-supervised category reconstruction
TLDR
The Canonical 3D Deformer Map is proposed, a new representation of the 3D shape of common object categories that can be learned from a collection of 2D images of independent objects that achieves state-of-the-art results in dense 3D reconstruction on public in- the-wild datasets of faces, cars, and birds. Expand
Unsupervised Learning of 3D Object Categories from Videos in the Wild
TLDR
A new neural network design is proposed, called warp-conditioned ray embedding (WCR), which significantly improves reconstruction while obtaining a detailed implicit representation of the object surface and texture, also compensating for the noise in the initial SfM reconstruction that bootstrapped the learning process. Expand
Continuous Surface Embeddings
TLDR
This work proposes a new, learnable image-based representation of dense correspondences and demonstrates that the proposed approach performs on par or better than the state-of-the-art methods for dense pose estimation for humans, while being conceptually simpler. Expand
Cycle-Consistent Generative Rendering for 2D-3D Modality Translation
TLDR
The utility of the learned representation, which infers an explicit 3D mesh representation, is demonstrated, as well as its performance on image generation and unpaired 3D shape inference tasks. Expand
A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation
TLDR
This work introduces Articulated Signed Distance Functions (A-SDF) to represent articulated shapes with a disentangled latent space, where they have separate codes for encoding shape and articulation, and proposes a Test-Time Adaptation inference algorithm to adjust the model during inference. Expand
DRACO: Weakly Supervised Dense Reconstruction And Canonicalization of Objects
TLDR
DRACO performs dense canonicalization using only weak supervision in the form of camera poses and semantic keypoints at train time, solely using one or more RGB images of an object. Expand
Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction
  • Feng Liu, Luan Tran, Xiaoming Liu
  • Computer Science
  • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2021
TLDR
This work shows that the complete shape and albedo modeling enables us to leverage real 2D images in both modeling and model fitting, and the effectiveness of this approach is demonstrated through superior 3D reconstruction from a single image, being either synthetic or real, and shape segmentation. Expand
Online Adaptation for Consistent Mesh Reconstruction in the Wild
TLDR
It is demonstrated that the algorithm recovers temporally consistent and reliable 3D structures from videos of non-rigid objects including those of animals captured in the wild -- an extremely challenging task rarely addressed before. Expand
...
1
2
3
...

References

SHOWING 1-10 OF 51 REFERENCES
Canonical Surface Mapping via Geometric Cycle Consistency
TLDR
This work explores the task of Canonical Surface Mapping and shows that the CSM task (pixel to 3D), when combined with 3D projection (3D to pixel), completes a cycle, thereby allowing forgo the dense manual supervision. Expand
Multi-view Consistency as Supervisory Signal for Learning Shape and Pose Prediction
TLDR
This work presents a framework for learning single-view shape and pose prediction without using direct supervision for either, and demonstrates the applicability of the framework in a realistic setting which is beyond the scope of existing techniques. Expand
Learning Category-Specific Mesh Reconstruction from Image Collections
TLDR
A learning framework for recovering the 3D shape, camera, and texture of an object from a single image by incorporating texture inference as prediction of an image in a canonical appearance space and shows that semantic keypoints can be easily associated with the predicted shapes. Expand
Learning Dense Correspondence via 3D-Guided Cycle Consistency
TLDR
It is demonstrated that the end-to-end trained ConvNet supervised by cycle-consistency outperforms state-of-the-art pairwise matching methods in correspondence-related tasks. Expand
Unsupervised learning of object frames by dense equivariant image labelling
TLDR
A new approach is proposed that, given a large number of images of an object and no other supervision, can extract a dense object-centric coordinate frame that is invariant to deformations of the images and comes with a dense equivariant labelling neural network that can map image pixels to their corresponding object coordinates. Expand
Viewpoints and keypoints
TLDR
The problem of pose estimation for rigid objects in terms of determining viewpoint to explain coarse pose and keypoint prediction to capture the finer details is characterized and it is demonstrated that leveraging viewpoint estimates can substantially improve local appearance based keypoint predictions. Expand
Slim DensePose: Thrifty Learning From Sparse Annotations and Motion Cues
TLDR
It is demonstrated that if annotations are collected in video frames, their efficacy can be multiplied for free by using motion cues, and that motion cues help much more when they are extracted from videos. Expand
Unsupervised Learning of Shape and Pose with Differentiable Point Clouds
TLDR
This work trains a convolutional network to predict both the shape and the pose from a single image by minimizing the reprojection error, and introduces an ensemble of pose predictors which are distill to a single "student" model. Expand
Learning a Predictable and Generative Vector Representation for Objects
TLDR
A novel architecture, called the TL-embedding network, is proposed, to learn an embedding space with generative and predictable properties, which enables tackling a number of tasks including voxel prediction from 2D images and 3D model retrieval. Expand
Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image
TLDR
The first method to automatically estimate the 3D pose of the human body as well as its 3D shape from a single unconstrained image is described, showing superior pose accuracy with respect to the state of the art. Expand
...
1
2
3
4
5
...