• Corpus ID: 15659494

Learning Where to Look: Data-Driven Viewpoint Set Selection for 3D Scenes

  title={Learning Where to Look: Data-Driven Viewpoint Set Selection for 3D Scenes},
  author={Kyle Genova and Manolis Savva and Angel X. Chang and Thomas A. Funkhouser},
The use of rendered images, whether from completely synthetic datasets or from 3D reconstructions, is increasingly prevalent in vision tasks. However, little attention has been given to how the selection of viewpoints affects the performance of rendered training sets. In this paper, we propose a data-driven approach to view set selection. Given a set of example images, we extract statistics describing their contents and generate a set of views matching the distribution of those statistics… 

Figures and Tables from this paper

Learning from THEODORE: A Synthetic Omnidirectional Top-View Indoor Dataset for Deep Transfer Learning

This paper introduces THEODORE: a novel, large-scale indoor dataset containing 100,000 highresolution diversified fisheye images with 16 classes, and shows that the dataset is well suited for fine-tuning CNNs for object detection and semantic segmentation.

Imitating Popular Photos to Select Views for an Indoor Scene

This work selects the view that can optimize the contour similarity of corresponding objects to the photo by imitating popular photos on the Internet and clusters the selected views by the weighted average to exhibit the scene.

Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding

  • Mike RobertsNathan Paczan
  • Computer Science, Environmental Science
    2021 IEEE/CVF International Conference on Computer Vision (ICCV)
  • 2021
This work introduces Hypersim, a photorealistic synthetic dataset for holistic indoor scene understanding, and finds that it is possible to generate the entire dataset from scratch, for roughly half the cost of training a popular open-source natural language processing model.

SLAM and deep learning for 3D indoor scene understanding

We build upon research in the fields of Simultaneous Localisation and Mapping (SLAM) and Deep Learning to develop 3D maps of indoor scenes that not only describe where things are but what they are.

Simultaneous View and Feature Selection for Collaborative Multi-Robot Recognition

This paper proposes a novel approach to collaborative multi-robot perception that simultaneously integrates view selection, feature selection, and object recognition into a unified regularized optimization formulation, which uses sparsity-inducing norms to identify the robots with themost representative views and the modalities with the most discriminative features.

Learning Reconstructability for Drone Aerial Path Planning

A neural network is trained to predict reconstructability for drone path planning during 3D urban scene acquisition and guides the iterative view planner to execute the onsite drone view acquisition for 3D reconstruction.

Multi-modal sensor fusion and selection for enhanced situational awareness

The proposed system displays observations based on the physical locations of the sensors, enabling a human operator to better understand where observations are located in the environment, and uses the optimal sensor fusion weights to scale the display of observations.

User-Weighted Viewpoint/Lighting Control for Multi-Object Scene



Perceptual models of viewpoint preference

The results of a large user study are leveraged to optimize the parameters of a general model for viewpoint goodness, such that the fitted model can predict people's preferred views for a broad range of objects.

SceneNet: Understanding Real World Indoor Scenes With Synthetic Data

This work focuses its attention on depth based semantic per-pixel labelling as a scene understanding problem and shows the potential of computer graphics to generate virtually unlimited labelled data from synthetic 3D scenes by carefully synthesizing training data with appropriate noise models.

The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes

This paper generates a synthetic collection of diverse urban images, named SYNTHIA, with automatically generated class annotations, and conducts experiments with DCNNs that show how the inclusion of SYnTHIA in the training stage significantly improves performance on the semantic segmentation task.

SceneNet RGB-D: 5M Photorealistic Images of Synthetic Indoor Trajectories with Ground Truth

We introduce SceneNet RGB-D, expanding the previous work of SceneNet to enable large scale photorealistic rendering of indoor scene trajectories. It provides pixel-perfect ground truth for scene

Recovering 6D Object Pose and Predicting Next-Best-View in the Crowd

This work proposes an unsupervised feature learnt from depth-invariant patches using a Sparse Autoencoder and offers an extensive evaluation of various state of the art features, and learns to estimate the reduction of uncertainty in other views, formulating the problem of selecting the next-best-view.

Physically-Based Rendering for Indoor Scene Understanding Using Convolutional Neural Networks

This work introduces a large-scale synthetic dataset with 500K physically-based rendered images from 45K realistic 3D indoor scenes and shows that pretraining with this new synthetic dataset can improve results beyond the current state of the art on all three computer vision tasks.

Semantic Scene Completion from a Single Depth Image

The semantic scene completion network (SSCNet) is introduced, an end-to-end 3D convolutional network that takes a single depth image as input and simultaneously outputs occupancy and semantic labels for all voxels in the camera view frustum.

Semantic Pose Using Deep Networks Trained on Synthetic RGB-D

This work proposes to find instances of common furniture classes, their spatial extent, and their pose with respect to generalized class models, and uses a deep, wide, multi-output convolutional neural network that predicts class, pose, and location of possible objects simultaneously.

Viewpoint Selection using Viewpoint Entropy

This paper uses the theoretical basis provided by Information Theory to define a new measure, viewpoint entropy, that allows us to compute good viewing positions automatically and designs an algorithm that uses this measure to explore automatically objects or scenes.

PatchMatch Based Joint View Selection and Depthmap Estimation

A multi-view depthmap estimation approach aimed at adaptively ascertaining the pixel level data associations between a reference image and all the elements of a source image set, and the linear computational and storage requirements of the formulation, as well as its inherent parallelism enables an efficient and scalable GPU-based implementation.