• Corpus ID: 227276306

Online Adaptation for Consistent Mesh Reconstruction in the Wild

@article{Li2020OnlineAF,
  title={Online Adaptation for Consistent Mesh Reconstruction in the Wild},
  author={Xueting Li and Sifei Liu and Shalini De Mello and Kihwan Kim and X. Wang and Ming-Hsuan Yang and Jan Kautz},
  journal={ArXiv},
  year={2020},
  volume={abs/2012.03196}
}
This paper presents an algorithm to reconstruct temporally consistent 3D meshes of deformable object instances from videos in the wild. Without requiring annotations of 3D mesh, 2D keypoints, or camera pose for each video frame, we pose video-based reconstruction as a self-supervised online adaptation problem applied to any incoming test video. We first learn a category-specific 3D reconstruction model from a collection of single-view images of the same category that jointly predicts the shape… 
DOVE: Learning Deformable 3D Objects by Watching Videos
TLDR
DOVE is a method that learns textured 3D models of deformable object categories from monocular videos available online, without keypoint, viewpoint or template shape supervision, and produces temporally consistent 3d models, which can be animated and rendered from arbitrary viewpoints.
ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction
TLDR
ViSER is introduced, a method for recovering articulated 3D shapes and dense 3D trajectories from monocular videos, making use of only 2D object masks and two-frame optical flow as inputs to establish dense long-range correspondences across pixels.
Multi-Category Mesh Reconstruction From Image Collections
TLDR
An alternative approach that infers the textured mesh of objects combining a series of deformable 3D models and a set of instance-specific deformation, pose, and texture, and experiments show that the proposed framework can distinguish between different object categories and learn category-specific shape priors in an unsupervised manner.
Unbiased 4D: Monocular 4D Reconstruction with a Neural Deformation Model
TLDR
The Ub4D technique includes two new—in the context of non-rigid 3D reconstruction—components, i.e., a coordinate-based and implicit neural representation for non- Rigid scenes, which enables an unbiased reconstruction of dynamic scenes, and a novel dynamic scene flow loss which enables the reconstruction of larger deformations.
Black-Box Test-Time Shape REFINEment for Single View 3D Reconstruction
TLDR
The novel REFINE paradigm and a new hierarchical multiview, multidomain image dataset with 3D meshes called 3D-ODDS are proposed as a uniquely challenging benchmark and believe that they are important steps towards truly robust, accurate 3D reconstructions.
LASR: Learning Articulated Shape Reconstruction from a Monocular Video
TLDR
This work introduces a template-free approach to learn 3D shapes from a single video with an analysis-by-synthesis strategy that forward-renders object silhouette, optical flow, and pixel values to compare with video observations, which generates gradients to adjust the camera, shape and motion parameters.
Fine Detailed Texture Learning for 3D Meshes with Generative Models
TLDR
This paper presents a method to reconstruct high-quality textured 3D models from both multi-view and single-view images, and proposes an attention mechanism that relies on the learnable positions of pixels.
Unsupervised Learning of Probably Symmetric Deformable 3D Objects From Images in the Wild
We propose a method to learn 3D deformable object categories from raw single-view images, without external supervision. The method is based on an autoencoder that factors each input image into depth,
BANMo: Building Animatable 3D Neural Models from Many Casual Videos
TLDR
This work aims to create high-fidelity, articulated 3D models from many casual RGB videos in a differentiable rendering framework, and introduces neural blend skinning models that allow for differentiable and invertible articulated deformations.
To The Point: Correspondence-driven monocular 3D category reconstruction
TLDR
To The Point (TTP), a method for reconstructing 3D objects from a single image using 2D to 3D correspondences learned from weak supervision, uses a simple per-sample optimization problem to replace CNN-based regression of camera pose and non-rigid deformation and thereby obtain substantially more accurate 3D reconstructions.
...
...

References

SHOWING 1-10 OF 68 REFERENCES
Self-supervised Single-view 3D Reconstruction via Semantic Consistency
TLDR
This work is the first to try and solve the single-view reconstruction problem without a category-specific template mesh or semantic keypoints, and demonstrates that the unsupervised method performs comparably if not better than existing category- specific reconstruction methods learned with supervision.
Photometric Mesh Optimization for Video-Aligned 3D Object Reconstruction
TLDR
3D object mesh reconstruction results are demonstrated from both synthetic and real-world videos with the photometric mesh optimization, which is unachievable with either naive mesh generation networks or traditional pipelines of surface reconstruction without heavy manual post-processing.
Learning Category-Specific Mesh Reconstruction from Image Collections
TLDR
A learning framework for recovering the 3D shape, camera, and texture of an object from a single image by incorporating texture inference as prediction of an image in a canonical appearance space and shows that semantic keypoints can be easily associated with the predicted shapes.
Learning to Generate and Reconstruct 3D Meshes with only 2D Supervision
TLDR
A unified framework tackling two problems: class-specific 3D reconstruction from a single image, and generation of new 3D shape samples, which is comparable or superior to state-of-the-art voxel-based approaches on quantitative metrics, while producing results that are visually more pleasing.
Consistent video depth estimation
TLDR
An algorithm for reconstructing dense, geometrically consistent depth for all pixels in a monocular video by using a learning-based prior, i.e., a convolutional neural network trained for single-image depth estimation.
Deep Mesh Reconstruction From Single RGB Images via Topology Modification Networks
TLDR
This paper presents an end-to-end single-view mesh reconstruction framework that is able to generate high-quality meshes with complex topologies from a single genus-0 template mesh and outperforms the current state-of-the-art methods both qualitatively and quantitatively.
Pixel2Mesh++: Multi-View 3D Mesh Generation via Deformation
TLDR
This model learns to predict series of deformations to improve a coarse shape iteratively and exhibits generalization capability across different semantic categories, number of input images, and quality of mesh initialization.
Exploiting Temporal Context for 3D Human Pose Estimation in the Wild
TLDR
A bundle-adjustment-based algorithm for recovering accurate 3D human pose and meshes from monocular videos and shows that retraining a single-frame 3D pose estimator on this data improves accuracy on both real-world and mocap data by evaluating on the 3DPW and HumanEVA datasets.
Unsupervised Learning of Probably Symmetric Deformable 3D Objects From Images in the Wild
We propose a method to learn 3D deformable object categories from raw single-view images, without external supervision. The method is based on an autoencoder that factors each input image into depth,
3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction
TLDR
The 3D-R2N2 reconstruction framework outperforms the state-of-the-art methods for single view reconstruction, and enables the 3D reconstruction of objects in situations when traditional SFM/SLAM methods fail (because of lack of texture and/or wide baseline).
...
...