SAIL-VOS 3D: A Synthetic Dataset and Baselines for Object Detection and 3D Mesh Reconstruction from Video Data

@article{Hu2021SAILVOS3A,
  title={SAIL-VOS 3D: A Synthetic Dataset and Baselines for Object Detection and 3D Mesh Reconstruction from Video Data},
  author={Yuan-Ting Hu and Jiahong Wang and Raymond A. Yeh and Alexander G. Schwing},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
  year={2021},
  pages={3359-3369}
}
Extracting detailed 3D information of objects from video data is an important goal for holistic scene understanding. While recent methods have shown impressive results when reconstructing meshes of objects from a single image, results often remain ambiguous as part of the object is unobserved. Moreover, existing image-based datasets for mesh reconstruction don't permit to study models which integrate temporal information. To alleviate both concerns we present SAIL-VOS 3D: a synthetic video… Expand

Figures and Tables from this paper

Playing for 3D Human Recovery
  • Zhongang Cai, Mingyuan Zhang, +9 authors Ziwei Liu
  • Computer Science
  • 2021
TLDR
This work contributes, GTA-Human, a mega-scale and highly-diverse 3D human dataset generated with the GTAV game engine and systematically investigates the performance of various methods under a wide spectrum of real-world variations, e.g. camera angles, poses, and occlusions. Expand

References

SHOWING 1-10 OF 96 REFERENCES
ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes
TLDR
This work introduces ScanNet, an RGB-D video dataset containing 2.5M views in 1513 scenes annotated with 3D camera poses, surface reconstructions, and semantic segmentations, and shows that using this data helps achieve state-of-the-art performance on several 3D scene understanding tasks. Expand
Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling
TLDR
A novel model is designed that simultaneously performs 3D reconstruction and pose estimation; this multi-task learning approach achieves state-of-the-art performance on both tasks. Expand
MarrNet : 3 D Shape Reconstruction via 2 . 5 D Sketches
3D object reconstruction from a single image is a highly under-determined problem, requiring strong prior knowledge of plausible 3D shapes. This introduces challenges for learning-based approaches,Expand
3D-RCNN: Instance-Level 3D Object Reconstruction via Render-and-Compare
TLDR
A differentiable Render-and-Compare loss is proposed that allows 3D shape and pose to be learned with 2D supervision and produces a compact 3D representation of the scene, which can be readily used for applications like autonomous driving. Expand
3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction
TLDR
The 3D-R2N2 reconstruction framework outperforms the state-of-the-art methods for single view reconstruction, and enables the 3D reconstruction of objects in situations when traditional SFM/SLAM methods fail (because of lack of texture and/or wide baseline). Expand
SAIL-VOS: Semantic Amodal Instance Level Video Object Segmentation – A Synthetic Dataset and Baselines
TLDR
This work introduces SAIL-VOS (Semantic Amodal Instance Level Video Object Segmentation), a new dataset aiming to stimulate semantic amodal segmentation research, and presents a synthetic dataset extracted from the photo-realistic game GTA-V. Expand
Boundary-Aware 3D Building Reconstruction From a Single Overhead Image
TLDR
A boundary-aware multi-task deep-learning-based framework for fast 3D building modeling from a single overhead image by jointly learning a modified signed distance function (SDF) from the building boundaries, a dense heightmap of the scene, and scene semantics. Expand
Pix2Vox: Context-Aware 3D Reconstruction From Single and Multi-View Images
TLDR
Experimental results on the ShapeNet and Pix3D benchmarks indicate that the proposed Pix2Vox outperforms state-of-the-arts by a large margin, and the proposed method is 24 times faster than 3D-R2N2 in terms of backward inference time. Expand
Pixel2Mesh++: Multi-View 3D Mesh Generation via Deformation
TLDR
This model learns to predict series of deformations to improve a coarse shape iteratively and exhibits generalization capability across different semantic categories, number of input images, and quality of mesh initialization. Expand
SceneNN: A Scene Meshes Dataset with aNNotations
TLDR
This paper introduces SceneNN, an RGB-D scene dataset consisting of 100 scenes that is used as a benchmark to evaluate the state-of-the-art methods on relevant research problems such as intrinsic decomposition and shape completion. Expand
...
1
2
3
4
5
...