Extreme Rotation Estimation using Dense Correlation Volumes

@article{Cai2021ExtremeRE,
  title={Extreme Rotation Estimation using Dense Correlation Volumes},
  author={Ruojin Cai and Bharath Hariharan and Noah Snavely and Hadar Averbuch-Elor},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021},
  pages={14561-14570}
}
We present a technique for estimating the relative 3D rotation of an RGB image pair in an extreme setting, where the images have little or no overlap. We observe that, even when images do not overlap, there may be rich hidden cues as to their geometric relationship, such as light source directions, vanishing points, and symmetries present in the scene. We propose a network design that can automatically learn such implicit cues by comparing all pairs of points between the two input images. Our… 

Virtual Correspondence: Humans as a Cue for Extreme-View Geometry

TLDR
A method to establish and exploit virtual correspondences based on humans in the scene that outperforms state-of-the-art camera pose estimation methods in challenging scenarios and is comparable in the traditional densely captured setup.

PoserNet: Refining Relative Camera Poses Exploiting Object Detections

TLDR
This work proposes Pose Refiner Network (PoserNet) a light-weight Graph Neural Network to refine the approximate pairwise relative camera poses by using objectness regions to guide the pose estimation problem rather than explicit semantic object detections.

FvOR: Robust Joint Shape and Pose Optimization for Few-view Object Reconstruction

TLDR
FvOR is a learning-based object reconstruction method that predicts accurate 3D models given a few images with noisy input poses using learnable neural network modules and achieves best-in-class results.

Visual Correspondence Hallucination

TLDR
This paper trains a network to output a peaked probability distribution over the correspondent’s location, regardless of this correspondent being visible, occluded, or outside the field of view, and demonstrates that this network is indeed able to hallucinate correspondences on pairs of images captured in scenes that were not seen at training-time.

Visual Correspondence Hallucination: Towards Geometric Reasoning

TLDR
This paper trains a network to output a peaked probability distribution over the correspondent’s location, regardless of this correspondent being visible, occluded, or outside the field of view, and demonstrates that this network is indeed able to hallucinate correspondences on unseen pairs of images.

HM3D-ABO: A Photo-realistic Dataset for Object-centric Multi-view 3D Reconstruction

TLDR
This report presents a photo-realistic object-centric dataset HM3D-ABO, constructed by composing realistic indoor scene and realistic object and providing multi-view RGB observa-tions, a water-tight mesh model for the object, ground truth depth map and object mask.

The 8-Point Algorithm as an Inductive Bias for Relative Pose Prediction by ViTs

TLDR
It is shown that a handful of modifications can be applied to a Vision Transformer (ViT) to bring its computations close to the Eight-Point Algorithm.

PlaneFormers: From Sparse View Planes to 3D Reconstruction

TLDR
A simpler approach is introduced, the PlaneFormer, that uses a transformer applied to 3D-aware plane tokens to perform 3D reasoning and is substantially more effective than prior work.

Structure from Silence: Learning Scene Structure from Ambient Sound

TLDR
It is suggested that ambient sound conveys a surprising amount of information about scene structure, and that it is a useful signal for learning multimodal features.

References

SHOWING 1-10 OF 60 REFERENCES

Extreme Relative Pose Network Under Hybrid Representations

TLDR
A novel RGB-D based relative pose estimation approach that is suitable for small-overlapping or non- overlapping scans and can output multiple relative poses and considerably boosts the performance of multi-scan reconstruction in few-view reconstruction settings.

Wide-Baseline Relative Camera Pose Estimation with Directional Learning

TLDR
DirectionNet is introduced, which estimates discrete distributions over the 5D relative pose space using a novel parameterization to make the estimation problem tractable, and shows a near 50% reduction in error over direct regression methods.

Extreme Relative Pose Estimation for RGB-D Scans via Scene Completion

TLDR
This work introduces a novel approach that extends the scope to extreme relative poses, with little or even no overlap between the input scans, to infer more complete scene information about the underlying environment and match on the completed scans.

Learning to Detect 3D Reflection Symmetry for Single-View Reconstruction

TLDR
This work presents a geometry-based end-to-end deep learning framework that first detects the mirror plane of reflection symmetry that commonly exists in man-made objects and then predicts depth maps by finding the intra-image pixel-wise correspondence of the symmetry.

Universal Correspondence Network

TLDR
A convolutional spatial transformer to mimic patch normalization in traditional features like SIFT is proposed, which is shown to dramatically boost accuracy for semantic correspondences across intra-class shape variations.

Camera Relocalization by Computing Pairwise Relative Poses Using Convolutional Neural Network

TLDR
The results show that the proposed approach generalizes well to previously unseen scenes and compares favourably to other recent CNN-based methods.

InLoc: Indoor Visual Localization with Dense Matching and View Synthesis

TLDR
A new large-scale visual localization method targeted for indoor environments that significantly outperforms current state-of-the-art indoor localization approaches on this new challenging data.

An Analysis of SVD for Deep Rotation Estimation

TLDR
An extensive quantitative analysis shows simply replacing existing representations with the SVD orthogonalization procedure obtains state of the art performance in many deep learning applications covering both supervised and unsupervised training.

InteriorNet: Mega-scale Multi-sensor Photo-realistic Indoor Scenes Dataset

TLDR
This dataset leverages the availability of millions of professional interior designs and millions of production-level furniture and object assets to provide a higher degree of photo-realism, larger scale, more variability as well as serving a wider range of purposes compared to existing datasets.

Free View Synthesis

TLDR
This work presents a method for novel view synthesis from input images that are freely distributed around a scene that can synthesize images for free camera movement through the scene, and works for general scenes with unconstrained geometric layouts.
...