pixelNeRF: Neural Radiance Fields from One or Few Images

@article{Yu2021pixelNeRFNR,
  title={pixelNeRF: Neural Radiance Fields from One or Few Images},
  author={Alex Yu and Vickie Ye and Matthew Tancik and Angjoo Kanazawa},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021},
  pages={4576-4585}
}
  • Alex Yu, V. Ye, +1 author Angjoo Kanazawa
  • Published 3 December 2020
  • Computer Science
  • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
We propose pixelNeRF, a learning framework that predicts a continuous neural scene representation conditioned on one or few input images. The existing approach for constructing neural radiance fields [27] involves optimizing the representation to every scene independently, requiring many calibrated views and significant compute time. We take a step towards resolving these shortcomings by introducing an architecture that conditions a NeRF on image inputs in a fully convolutional manner. This… 
Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes
TLDR
Stereo Radiance Fields is introduced, a neural view synthesis approach that is trained end-to-end, generalizes to new scenes, and requires only sparse views at test time, andExperiments show that SRF learns structure instead of over-fitting on a scene, achieving significantly sharper, more detailed results than scene-specific models.
RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs
TLDR
This work observes that the majority of artifacts in sparse input scenarios are caused by errors in the estimated scene geometry, and by divergent behavior at the start of training, and addresses this by regularizing the geometry and appearance of patches rendered from unobserved viewpoints, and annealing the ray sampling space during training.
Efficient Neural Radiance Fields with Learned Depth-Guided Sampling
TLDR
A hybrid scene representation which combines the best of implicit radiance fields and explicit depth maps for efficient rendering and the capability of the method to synthesize free-viewpoint videos of dynamic human performers in real-time is demonstrated.
LOLNeRF: Learn from One Look
TLDR
It is shown that by reconstructing many images aligned to an approximate canonical pose with a single network conditioned on a shared latent space, you can learn a space of radiance fields that models shape and appearance for a class of objects.
Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis
TLDR
DietNeRF improves the perceptual quality of few-shot view synthesis when learned from scratch, can render novel views with as few as one observed image when pre-trained on a multi-view dataset, and produces plausible completions of completely unobserved regions.
Light Field Implicit Representation for Flexible Resolution Reconstruction
TLDR
This work proposes a novel implicit model for 4D light fields conditioned on convolutional features of a sparse set of input views that outperforms current state-of-the-art baselines on these tasks, while utilizing only a fraction of run-time of the baselines.
NeuralMVS: Bridging Multi-View Stereo and Novel View Synthesis
TLDR
This work proposes to bridge the gap between these two methodologies with a novel network that can recover 3D scene geometry as a distance function, together with high-resolution color images, and uses only a sparse set of images as input and can generalize well to novel scenes.
Unconstrained Scene Generation with Locally Conditioned Radiance Fields
TLDR
Generative Scene Networks is introduced, which learns to decompose scenes into a collection of many local radiance fields that can be rendered from a free moving camera, and which produces quantitatively higher-quality scene renderings across several different scene datasets.
Unsupervised Discovery of Object Radiance Fields
TLDR
UORF, trained on multi-view RGB images without annotations, learns to decompose complex scenes with diverse, textured background from a single image and performs well on unsupervised 3D scene segmentation, novel view synthesis, and scene editing on three datasets.
MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo
TLDR
This work proposes a generic deep neural network that can reconstruct radiance fields from only three nearby input views via fast network inference, and leverages plane-swept cost volumes for geometry-aware scene reasoning, and combines this with physically based volume rendering for neural radiance field reconstruction.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 65 REFERENCES
Deep Stereo: Learning to Predict New Views from the World's Imagery
TLDR
This work presents a novel deep architecture that performs new view synthesis directly from pixels, trained from a large number of posed image sets, and is the first to apply deep learning to the problem ofnew view synthesis from sets of real-world, natural imagery.
SynSin: End-to-End View Synthesis From a Single Image
TLDR
This work proposes a novel differentiable point cloud renderer that is used to transform a latent 3D point cloud of features into the target view and outperforms baselines and prior work on the Matterport, Replica, and RealEstate10K datasets.
View Synthesis by Appearance Flow
TLDR
This work addresses the problem of novel view synthesis: given an input image, synthesizing new images of the same object or scene observed from arbitrary viewpoints and shows that for both objects and scenes, this approach is able to synthesize novel views of higher perceptual quality than previous CNN-based techniques.
Free View Synthesis
TLDR
This work presents a method for novel view synthesis from input images that are freely distributed around a scene that can synthesize images for free camera movement through the scene, and works for general scenes with unconstrained geometric layouts.
Single-view to Multi-view: Reconstructing Unseen Views with a Convolutional Network
TLDR
A convolutional network capable of generating images of a previously unseen object from arbitrary viewpoints given a single image of this object and an implicit 3D representation of the object class is presented.
Occupancy Networks: Learning 3D Reconstruction in Function Space
TLDR
This paper proposes Occupancy Networks, a new representation for learning-based 3D reconstruction methods that encodes a description of the 3D output at infinite resolution without excessive memory footprint, and validate that the representation can efficiently encode 3D structure and can be inferred from various kinds of input.
Neural scene representation and rendering
TLDR
The Generative Query Network (GQN) is introduced, a framework within which machines learn to represent scenes using only their own sensors, demonstrating representation learning without human labels or domain knowledge.
MVSNet: Depth Inference for Unstructured Multi-view Stereo
TLDR
This work presents an end-to-end deep learning architecture for depth map inference from multi-view images that flexibly adapts arbitrary N-view inputs using a variance-based cost metric that maps multiple features into one cost feature.
DeepVoxels: Learning Persistent 3D Feature Embeddings
TLDR
This work proposes DeepVoxels, a learned representation that encodes the view-dependent appearance of a 3D scene without having to explicitly model its geometry, based on a Cartesian 3D grid of persistent embedded features that learn to make use of the underlying3D scene structure.
Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations
TLDR
The proposed Scene Representation Networks (SRNs), a continuous, 3D-structure-aware scene representation that encodes both geometry and appearance, are demonstrated by evaluating them for novel view synthesis, few-shot reconstruction, joint shape and appearance interpolation, and unsupervised discovery of a non-rigid face model.
...
1
2
3
4
5
...