Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding
@article{Roberts2021HypersimAP, title={Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding}, author={Mike Roberts and Nathan Paczan}, journal={2021 IEEE/CVF International Conference on Computer Vision (ICCV)}, year={2021}, pages={10892-10902} }
For many fundamental scene understanding tasks, it is difficult or impossible to obtain per-pixel ground truth labels from real images. We address this challenge by introducing Hypersim, a photorealistic synthetic dataset for holistic indoor scene understanding. To create our dataset, we leverage a large repository of synthetic scenes created by professional artists, and we generate 77,400 images of 461 indoor scenes with detailed per-pixel labels and corresponding ground truth geometry. Our…
Figures and Tables from this paper
40 Citations
OpenRooms: An Open Framework for Photorealistic Indoor Scene Datasets
- Computer Science2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2021
This work proposes a novel framework for creating large-scale photorealistic datasets of indoor scenes, with ground truth geometry, material, lighting and semantics, and shows that deep networks trained on the proposed dataset achieve competitive performance for shape, material and lighting estimation on real images.
UnityShip: A Large-Scale Synthetic Dataset for Ship Recognition in Aerial Images
- Computer ScienceRemote. Sens.
- 2021
The experimental results show that for small-sized and medium-sized real-world datasets, the synthetic data achieve an improvement in model pre-training and data augmentation, showing the value and potential of synthetic data in aerial image recognition and understanding tasks.
ABO: Dataset and Benchmarks for Real-World 3D Object Understanding
- Computer ScienceArXiv
- 2021
We introduce Amazon-Berkeley Objects (ABO), a new large-scale dataset of product images and 3D models corresponding to real household objects. We use this realistic, object-centric 3D dataset to…
Recognizing Scenes from Novel Viewpoints
- Computer ScienceArXiv
- 2021
This work proposes a model which takes as input a few RGB images of a new scene and recognizes the scene from novel viewpoints by segmenting it into semantic categories, and demonstrates its ability to jointly capture semantics and geometry of novel scenes with diverse layouts, object types and shapes.
Towards 3D Scene Reconstruction from Locally Scale-Aligned Monocular Video Depth
- Computer Science
- 2022
A locally weighted linear regression method to recover the scale and shift with very sparse anchor points, which ensures the scale consistency along consecutive frames, is proposed, which can boost the performance of existing state-of-the-art approaches by 50% at most over several zero-shot benchmarks.
MINERVAS: Massive INterior EnviRonments VirtuAl Synthesis
- Computer ScienceArXiv
- 2021
MINERVAS, a Massive INterior EnviRonments VirtuAl Synthesis system, to facilitate the 3D scene modification and the 2D image synthesis for various vision tasks and empowers users to access commercial scene databases with millions of indoor scenes and protects the copyright of core data assets.
Colored Point Cloud to Image Alignment
- Computer ScienceArXiv
- 2021
A differential optimization method that aligns a colored point cloud to a given color image via iterative geometric and color matching and enables the construction of RGB-D datasets for specific camera systems such as shape from stereo is introduced.
Delve into balanced and accurate approaches for ship detection in aerial images
- Computer ScienceNeural Computing and Applications
- 2021
This paper uses the virtual 3D engine to create scenes with ship objects and annotate the collected images with bounding boxes automatically to generate the synthetic ship detection dataset, called unreal-ship, and designs an efficient anchor generation structure Guided Anchor, utilizing the semantic information to guide and generate high-quality anchors.
Toward Practical Self-Supervised Monocular Indoor Depth Estimation
- Computer ScienceArXiv
- 2021
A structure distillation approach to learn knacks from a pretrained depth estimator that produces structured but metricagnostic depth due to its in-the-wild mixed-dataset training is proposed, laying a solid basis for practical indoor depth estimation via self-supervision.
NViSII: A Scriptable Tool for Photorealistic Image Generation
- Computer ScienceArXiv
- 2021
This work demonstrates the use of data generated by path tracing for training an object detector and pose estimator, showing improved performance in sim-to-real transfer in situations that are difficult for traditional raster-based renderers.
References
SHOWING 1-10 OF 122 REFERENCES
SceneNet RGB-D: Can 5M Synthetic Images Beat Generic ImageNet Pre-training on Indoor Segmentation?
- Computer Science2017 IEEE International Conference on Computer Vision (ICCV)
- 2017
Analysis of SceneNet RGB-D suggests that large-scale high-quality synthetic RGB datasets with task-specific labels can be more useful for pretraining than real-world generic pre-training such as ImageNet.
Neural Inverse Rendering of an Indoor Scene From a Single Image
- Computer Science2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019
This work proposes the first learning based approach that jointly estimates albedo, normals, and lighting of an indoor scene from a single image, and uses physically-based rendering to create a large-scale synthetic dataset, named SUNCG-PBR, which is a significant improvement over prior datasets.
SUN RGB-D: A RGB-D scene understanding benchmark suite
- Computer Science2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015
This paper introduces an RGB-D benchmark suite for the goal of advancing the state-of-the-arts in all major scene understanding tasks, and presents a dataset that enables the train data-hungry algorithms for scene-understanding tasks, evaluate them using meaningful 3D metrics, avoid overfitting to a small testing set, and study cross-sensor bias.
Intrinsic images in the wild
- Computer ScienceACM Trans. Graph.
- 2014
This paper introduces Intrinsic Images in the Wild, a large-scale, public dataset for evaluating intrinsic image decompositions of indoor scenes, and develops a dense CRF-based intrinsic image algorithm for images in the wild that outperforms a range of state-of-the-art intrinsic image algorithms.
Physically-Based Rendering for Indoor Scene Understanding Using Convolutional Neural Networks
- Computer Science2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
This work introduces a large-scale synthetic dataset with 500K physically-based rendered images from 45K realistic 3D indoor scenes and shows that pretraining with this new synthetic dataset can improve results beyond the current state of the art on all three computer vision tasks.
ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes
- Computer Science2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
This work introduces ScanNet, an RGB-D video dataset containing 2.5M views in 1513 scenes annotated with 3D camera poses, surface reconstructions, and semantic segmentations, and shows that using this data helps achieve state-of-the-art performance on several 3D scene understanding tasks.
Inverse Rendering for Complex Indoor Scenes: Shape, Spatially-Varying Lighting and SVBRDF From a Single Image
- Environmental Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
A deep inverse rendering framework for indoor scenes, which combines novel methods to map complex materials to existing indoor scene datasets and a new physically-based GPU renderer to create a large-scale, photorealistic indoor dataset.
Understanding RealWorld Indoor Scenes with Synthetic Data
- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
This work focuses its attention on depth based semantic per-pixel labelling as a scene understanding problem and shows the potential of computer graphics to generate virtually unlimited labelled data from synthetic 3D scenes.
Shading Annotations in the Wild
- Computer Science2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
This work introduces Shading Annotations in the Wild (SAW), a new large-scale, public dataset of shading annotations in indoor scenes, comprised of multiple forms of shading judgments obtained via crowdsourcing, along with shading annotations automatically generated from RGB-D imagery.
Configurable 3D Scene Synthesis and 2D Image Rendering with Per-pixel Ground Truth Using Stochastic Grammars
- Computer ScienceInternational Journal of Computer Vision
- 2018
The value of the synthesized dataset is demonstrated, by improving performance in certain machine-learning-based scene understanding tasks—depth and surface normal prediction, semantic segmentation, reconstruction, etc.—and by providing benchmarks for and diagnostics of trained models by modifying object attributes and scene properties in a controllable manner.