Deep Stereo: Learning to Predict New Views from the World's Imagery

@article{Flynn2016DeepSL,
  title={Deep Stereo: Learning to Predict New Views from the World's Imagery},
  author={John Flynn and Ivan Neulander and James Philbin and Noah Snavely},
  journal={2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2016},
  pages={5515-5524}
}
Deep networks have recently enjoyed enormous success when applied to recognition and classification problems in computer vision [22, 33], but their use in graphics problems has been limited ([23, 7] are notable recent exceptions). In this work, we present a novel deep architecture that performs new view synthesis directly from pixels, trained from a large number of posed image sets. In contrast to traditional approaches, which consist of multiple complex stages of processing, each of which… 

Figures and Tables from this paper

Geometry and uncertainty in deep learning for computer vision
TLDR
This thesis presents end-to-end deep learning architectures for a number of core computer vision problems; scene understanding, camera pose estimation, stereo vision and video semantic segmentation, and introduces ideas from probabilistic modelling and Bayesian deep learning to understand uncertainty in computer vision models.
Deep Learning based Novel View Synthesis
TLDR
A deep convolutional neural network (CNN) which learns to predict novel views of a scene from given collection of images and estimates, at each pixel, the probability distribution over possible depth levels in the scene.
Fast View Synthesis with Deep Stereo Vision
TLDR
A novel view synthesis approach based on stereo-vision and CNNs that decomposes the problem into two sub-tasks: view dependent geometry estimation and texture inpainting that could be effectively learned with CNNs is presented.
Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes
TLDR
Stereo Radiance Fields is introduced, a neural view synthesis approach that is trained end-to-end, generalizes to new scenes, and requires only sparse views at test time, andExperiments show that SRF learns structure instead of over-fitting on a scene, achieving significantly sharper, more detailed results than scene-specific models.
A Lightweight Neural Network for Monocular View Generation With Occlusion Handling
TLDR
A very lightweight neural network architecture, trained on stereo data pairs, which performs view synthesis from one single image, which outperforms visually and metric-wise state-of-the-art approaches on the challenging KITTI dataset, all while reducing by a very significant order of magnitude the required number of parameters.
CONVOLUTIONS WITH GLOBAL AND LOCAL ADAPTIVE DILATIONS
TLDR
The proposed monster-net is capable of reconstructing more reliable image structures in synthesized images with coherent geometry, and significantly outperforms the state-of-the-art method (SOTA) by a large margin in all metrics of RMSE, PSNR, and SSIM.
Vision Transformer for NeRF-Based View Synthesis from a Single Input Image
TLDR
This work seeks to substantially reduce the inputs to a single unposed image and synthesize a novel view by training a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Geometry-Aware Deep Network for Single-Image Novel View Synthesis
TLDR
A new region-aware geometric transform network is developed that performs these multiple tasks in a common framework and demonstrates the effectiveness of the network in generating high-quality synthetic views that respect the scene geometry, thus outperforming the state-of-the-art methods.
View Synthesis by Appearance Flow
TLDR
This work addresses the problem of novel view synthesis: given an input image, synthesizing new images of the same object or scene observed from arbitrary viewpoints and shows that for both objects and scenes, this approach is able to synthesize novel views of higher perceptual quality than previous CNN-based techniques.
DeepMVS: Learning Multi-view Stereopsis
TLDR
The results show that DeepMVS compares favorably against state-of-the-art conventional MVS algorithms and other ConvNet based methods, particularly for near-textureless regions and thin structures.
...
...

References

SHOWING 1-10 OF 54 REFERENCES
Depth Map Prediction from a Single Image using a Multi-Scale Deep Network
TLDR
This paper employs two deep network stacks: one that makes a coarse global prediction based on the entire image, and another that refines this prediction locally, and applies a scale-invariant error to help measure depth relations rather than scale.
Designing deep networks for surface normal estimation
TLDR
This paper proposes to build upon the decades of hard work in 3D scene understanding to design a new CNN architecture for the task of surface normal estimation and shows that incorporating several constraints and meaningful intermediate representations in the architecture leads to state of the art performance on surfacenormal estimation.
Stereopsis via deep learning
TLDR
A probabilistic, deep learning approach to modeling disparity and a methodology for generating binocular training data to estimate model parameters are described and it is demonstrated how a three-layer network can learn to infer depth entirely from training data.
Fast cost-volume filtering for visual correspondence and beyond
TLDR
This paper proposes a generic and simple framework comprising three steps: constructing a cost volume, fast cost volume filtering and winner-take-all label selection, and achieves state-of-the-art results that achieve disparity maps in real-time, and optical flow fields with very fine structures as well as large displacements.
On New View Synthesis Using Multiview Stereo
TLDR
It is shown that application of modern multiview stereo techniques to the newview synthesis (NVS) problem introduces a number of non-trivial complexities, and a synthesis of the two approaches which yields good results on difficult image sequences is addressed.
Deep Convolutional Inverse Graphics Network
This paper presents the Deep Convolution Inverse Graphics Network (DC-IGN), a model that aims to learn an interpretable representation of images, disentangled with respect to three-dimensional scene
Learning 3-D Scene Structure from a Single Still Image
TLDR
This work considers the problem of estimating detailed 3D structure from a single still image of an unstructured environment and uses a Markov random field (MRF) to infer a set of "plane parameters" that capture both the 3D location and 3D orientation of the patch.
Image-Based Rendering Using Image-Based Priors
TLDR
The paper’s second contribution is to constrain the generated views to lie in the space of images whose texture statistics are those of the input images, which amounts to an image-based prior on the reconstruction which regularizes the solution, yielding realistic synthetic views.
View Synthesis for Recognizing Unseen Poses of Object Classes
TLDR
This work proposes a novel representation to model 3D object classes that allows the model to synthesize novel views of an object class at recognition time and incorporates it in a novel two-step algorithm that is able to classify objects under arbitrary and/or unseen poses.
Depth synthesis and local warps for plausible image-based navigation
TLDR
This work introduces a new IBR algorithm that is robust to missing or unreliable geometry, providing plausible novel views even in regions quite far from the input camera positions, and demonstrates novel view synthesis in real time for multiple challenging scenes with significant depth complexity.
...
...