Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding

@article{Roberts2021HypersimAP,
  title={Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding},
  author={Mike Roberts and Nathan Paczan},
  journal={2021 IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2021},
  pages={10892-10902}
}
  • Mike Roberts, Nathan Paczan
  • Published 4 November 2020
  • Computer Science, Environmental Science
  • 2021 IEEE/CVF International Conference on Computer Vision (ICCV)
For many fundamental scene understanding tasks, it is difficult or impossible to obtain per-pixel ground truth labels from real images. We address this challenge by introducing Hypersim, a photorealistic synthetic dataset for holistic indoor scene understanding. To create our dataset, we leverage a large repository of synthetic scenes created by professional artists, and we generate 77,400 images of 461 indoor scenes with detailed per-pixel labels and corresponding ground truth geometry. Our… 
OpenRooms: An Open Framework for Photorealistic Indoor Scene Datasets
TLDR
This work proposes a novel framework for creating large-scale photorealistic datasets of indoor scenes, with ground truth geometry, material, lighting and semantics, and shows that deep networks trained on the proposed dataset achieve competitive performance for shape, material and lighting estimation on real images.
UnityShip: A Large-Scale Synthetic Dataset for Ship Recognition in Aerial Images
TLDR
The experimental results show that for small-sized and medium-sized real-world datasets, the synthetic data achieve an improvement in model pre-training and data augmentation, showing the value and potential of synthetic data in aerial image recognition and understanding tasks.
ABO: Dataset and Benchmarks for Real-World 3D Object Understanding
We introduce Amazon-Berkeley Objects (ABO), a new large-scale dataset of product images and 3D models corresponding to real household objects. We use this realistic, object-centric 3D dataset to
Recognizing Scenes from Novel Viewpoints
TLDR
This work proposes a model which takes as input a few RGB images of a new scene and recognizes the scene from novel viewpoints by segmenting it into semantic categories, and demonstrates its ability to jointly capture semantics and geometry of novel scenes with diverse layouts, object types and shapes.
Towards 3D Scene Reconstruction from Locally Scale-Aligned Monocular Video Depth
TLDR
A locally weighted linear regression method to recover the scale and shift with very sparse anchor points, which ensures the scale consistency along consecutive frames, is proposed, which can boost the performance of existing state-of-the-art approaches by 50% at most over several zero-shot benchmarks.
MINERVAS: Massive INterior EnviRonments VirtuAl Synthesis
TLDR
MINERVAS, a Massive INterior EnviRonments VirtuAl Synthesis system, to facilitate the 3D scene modification and the 2D image synthesis for various vision tasks and empowers users to access commercial scene databases with millions of indoor scenes and protects the copyright of core data assets.
Colored Point Cloud to Image Alignment
TLDR
A differential optimization method that aligns a colored point cloud to a given color image via iterative geometric and color matching and enables the construction of RGB-D datasets for specific camera systems such as shape from stereo is introduced.
Delve into balanced and accurate approaches for ship detection in aerial images
TLDR
This paper uses the virtual 3D engine to create scenes with ship objects and annotate the collected images with bounding boxes automatically to generate the synthetic ship detection dataset, called unreal-ship, and designs an efficient anchor generation structure Guided Anchor, utilizing the semantic information to guide and generate high-quality anchors.
Toward Practical Self-Supervised Monocular Indoor Depth Estimation
TLDR
A structure distillation approach to learn knacks from a pretrained depth estimator that produces structured but metricagnostic depth due to its in-the-wild mixed-dataset training is proposed, laying a solid basis for practical indoor depth estimation via self-supervision.
NViSII: A Scriptable Tool for Photorealistic Image Generation
TLDR
This work demonstrates the use of data generated by path tracing for training an object detector and pose estimator, showing improved performance in sim-to-real transfer in situations that are difficult for traditional raster-based renderers.
...
1
2
3
4
...

References

SHOWING 1-10 OF 122 REFERENCES
SceneNet RGB-D: Can 5M Synthetic Images Beat Generic ImageNet Pre-training on Indoor Segmentation?
TLDR
Analysis of SceneNet RGB-D suggests that large-scale high-quality synthetic RGB datasets with task-specific labels can be more useful for pretraining than real-world generic pre-training such as ImageNet.
Neural Inverse Rendering of an Indoor Scene From a Single Image
TLDR
This work proposes the first learning based approach that jointly estimates albedo, normals, and lighting of an indoor scene from a single image, and uses physically-based rendering to create a large-scale synthetic dataset, named SUNCG-PBR, which is a significant improvement over prior datasets.
SUN RGB-D: A RGB-D scene understanding benchmark suite
TLDR
This paper introduces an RGB-D benchmark suite for the goal of advancing the state-of-the-arts in all major scene understanding tasks, and presents a dataset that enables the train data-hungry algorithms for scene-understanding tasks, evaluate them using meaningful 3D metrics, avoid overfitting to a small testing set, and study cross-sensor bias.
Intrinsic images in the wild
TLDR
This paper introduces Intrinsic Images in the Wild, a large-scale, public dataset for evaluating intrinsic image decompositions of indoor scenes, and develops a dense CRF-based intrinsic image algorithm for images in the wild that outperforms a range of state-of-the-art intrinsic image algorithms.
Physically-Based Rendering for Indoor Scene Understanding Using Convolutional Neural Networks
TLDR
This work introduces a large-scale synthetic dataset with 500K physically-based rendered images from 45K realistic 3D indoor scenes and shows that pretraining with this new synthetic dataset can improve results beyond the current state of the art on all three computer vision tasks.
ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes
TLDR
This work introduces ScanNet, an RGB-D video dataset containing 2.5M views in 1513 scenes annotated with 3D camera poses, surface reconstructions, and semantic segmentations, and shows that using this data helps achieve state-of-the-art performance on several 3D scene understanding tasks.
Inverse Rendering for Complex Indoor Scenes: Shape, Spatially-Varying Lighting and SVBRDF From a Single Image
TLDR
A deep inverse rendering framework for indoor scenes, which combines novel methods to map complex materials to existing indoor scene datasets and a new physically-based GPU renderer to create a large-scale, photorealistic indoor dataset.
Understanding RealWorld Indoor Scenes with Synthetic Data
TLDR
This work focuses its attention on depth based semantic per-pixel labelling as a scene understanding problem and shows the potential of computer graphics to generate virtually unlimited labelled data from synthetic 3D scenes.
Shading Annotations in the Wild
TLDR
This work introduces Shading Annotations in the Wild (SAW), a new large-scale, public dataset of shading annotations in indoor scenes, comprised of multiple forms of shading judgments obtained via crowdsourcing, along with shading annotations automatically generated from RGB-D imagery.
Configurable 3D Scene Synthesis and 2D Image Rendering with Per-pixel Ground Truth Using Stochastic Grammars
TLDR
The value of the synthesized dataset is demonstrated, by improving performance in certain machine-learning-based scene understanding tasks—depth and surface normal prediction, semantic segmentation, reconstruction, etc.—and by providing benchmarks for and diagnostics of trained models by modifying object attributes and scene properties in a controllable manner.
...
1
2
3
4
5
...