Surface Normal Estimation of Tilted Images via Spatial Rectifier

  title={Surface Normal Estimation of Tilted Images via Spatial Rectifier},
  author={Tien Do and Khiem Vuong and Stergios I. Roumeliotis and Hyun Soo Park},
In this paper, we present a spatial rectifier to estimate surface normals of tilted images. Tilted images are of particular interest as more visual data are captured by arbitrarily oriented sensors such as body-/robot-mounted cameras. Existing approaches exhibit bounded performance on predicting surface normals because they were trained using gravity-aligned images. Our two main hypotheses are: (1) visual scene layout is indicative of the gravity direction; and (2) not all surfaces are equally… 
Estimating and Exploiting the Aleatoric Uncertainty in Surface Normal Estimation
The proposed uncertainty-guided sampling prevents the bias in training towards large planar surfaces and improves the quality of prediction, especially near object boundaries and on small structures.
Deep Depth Estimation from Visual-Inertial SLAM
This paper uses the available gravity estimate from the VI-SLAM to warp the input image to the orientation prevailing in the training dataset, which results in a significant performance gain for the surface normal estimate, and thus the dense depth estimates.
Learning to Detect Scene Landmarks for Camera Localization
This work presents a new learned camera localization technique that eliminates the need to store features or a detailed 3D point cloud and demonstrates that this method outperforms DSAC*, the state-of-the-art in learned localization.
Auto-Rectify Network for Unsupervised Indoor Depth Estimation.
This work establishes that the complex ego-motions exhibited in handheld settings are a critical obstacle for learning depth, and proposes an Auto-Rectify Network with novel loss functions, which can automatically learn to rectify images during training.
Shape from Polarization for Complex Scenes in the Wild
This work proposes a learning-based framework with a multi-head self-attention module and viewing encoding, which is designed to handle increasing polarization ambiguities caused by complex materials and non-orthographic projection in scene-level SfP.
Multiple Cylinder Extraction from Organized Point Clouds
Quantitative and qualitative results show that the proposed algorithm outperforms the baseline algorithms in each of the following areas: normal estimation, cylinder detection, and cylinder extraction.
Transformers Solve the Limited Receptive Field for Monocular Depth Prediction
This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels (i.e., monocular depth prediction and surface normal estimation) and achieves state-of-the-art performance on three challenging datasets.
A Simple Approach to Image Tilt Correction with Self-Attention MobileNet for Smartphones
A Self-Attention MobileNet is presented, called SA-MobileNet Network that can model long-range dependencies between the image features instead of processing the local region as done by standard convolutional kernels.
Transformer-Based Attention Networks for Continuous Pixel-Wise Prediction
This paper proposes TransDepth, an architecture that benefits from both convolutional neural networks and transformers that applies transformers to pixel-wise prediction problems involving continuous labels and achieves state-of-theart performance on three challenging datasets.
NeuRIS: Neural Reconstruction of Indoor Scenes Using Normal Priors
The key idea of NeuRIS is to integrate estimated normal of indoor scenes as a prior in a neural rendering framework for reconstructing large texture-less shapes and, importantly, to do this in an adaptive manner to also enable the reconstruction of irregular shapes with fine details.


UprightNet: Geometry-Aware Camera Orientation Estimation From Single Images
This work designs a network that predicts two representations of scene geometry, in both the local camera and global reference coordinate systems, and solves for the camera orientation as the rotation that best aligns these two predictions via a differentiable least squares module.
SURGE: Surface Regularized Geometry Estimation from a Single Image
An approach to regularize 2.5D surface normal and depth predictions at each pixel given a single input image and proposes new planar-wise metrics to evaluate geometry consistency within planar surfaces, which are more tightly related to dependent 3D editing applications.
Spherical Regression: Learning Viewpoints, Surface Normals and 3D Rotations on N-Spheres
By introducing a spherical exponential mapping on n-spheres at the regression output, this work obtains well-behaved gradients, leading to stable training and shows how the spherical regression can be utilized for several computer vision challenges, specifically viewpoint estimation, surface normal estimation and 3D rotation estimation.
Dense monocular reconstruction using surface normals
This paper presents an efficient framework for dense 3D scene reconstruction using input from a moving monocular camera and shows that using the surface normal prior leads to better reconstructions than the weaker smoothness prior.
FrameNet: Learning Local Canonical Frames of 3D Surfaces From a Single RGB Image
The novel problem of identifying dense canonical 3D coordinate frames from a single RGB image is introduced and an algorithm to predict these axes from RGB is proposed that predicts 3D canonical frames that can be used in applications ranging from surface normal estimation, feature matching, and augmented reality.
In-Plane Rotation-Aware Monocular Depth Estimation Using SLAM
This work proposes a simple but effective refining method that incorporates in-plane roll alignment using camera poses of monocular Simultaneous Localization and Mapping (SLAM) and results show the effectiveness of this approach.
Real-time joint estimation of camera orientation and vanishing points
This work proposes a novel method that jointly estimates the VPs and camera orientation based on sequential Bayesian filtering, which does not require the Manhattan world assumption, and can perform a highly accurate estimation of camera orientation in real time.
Geo-Supervised Visual Depth Prediction
We propose using global orientation from inertial measurements, and the bias it induces on the shape of objects populating the scene, to inform visual three-dimensional reconstruction. We test the
Marr Revisited: 2D-3D Alignment via Surface Normal Prediction
A skip-network model built on the pre-trained Oxford VGG convolutional neural network (CNN) for surface normal prediction achieves state-of-the-art accuracy on the NYUv2 RGB-D dataset, and recovers fine object detail compared to previous methods.
Image Orientation Estimation with Convolutional Networks
It is demonstrated that a convolutional network can learn subtle features to predict the canonical orientation of images, and this approach runs in real-time and can be applied also to live video streams.