Movement-induced Priors for Deep Stereo

  title={Movement-induced Priors for Deep Stereo},
  author={Yuxin Hou and Muhammad Kamran Janjua and Juho Kannala and A. Solin},
  journal={2020 25th International Conference on Pattern Recognition (ICPR)},
We propose a method for fusing stereo disparity estimation with movement-induced prior information. Instead of independent inference frame-by-frame, we formulate the problem as a non-parametric learning task in terms of a temporal Gaussian process prior with a movement-driven kernel for inter-frame reasoning. We present a hierarchy of three Gaussian process kernels depending on the availability of motion information, where our main focus is on a new gyroscope-driven kernel for handheld devices… 

Figures and Tables from this paper

Gaussian Process Priors for View-Aware Inference
A principled framework for encoding prior knowledge of information coupling between views or camera poses of a single scene is derived and it is shown how this soft-prior knowledge can be applied to improve performance on several real vision tasks, such as feature tracking, human face encoding, and view synthesis.


Multi-View Stereo by Temporal Nonparametric Fusion
A pose-kernel structure that encourages similar poses to have resembling latent spaces is proposed that circumvents standard pitfalls in scaling Gaussian process inference, and can run in real-time on smart devices.
Real-Time Self-Adaptive Deep Stereo
This work proposes to perform unsupervised and continuous online adaptation of a deep stereo network, which allows for preserving its accuracy in any environment, and introduces the first real-time self-adaptive deep stereo system enabling competitive performance on heterogeneous datasets.
End-to-End Learning of Geometry and Context for Deep Stereo Regression
We propose a novel deep learning architecture for regressing disparity from a rectified pair of stereo images. We leverage knowledge of the problem’s geometry to form a cost volume using deep feature
Temporally Consistent Depth Estimation in Videos with Recurrent Architectures
This paper introduces for the first time an approach that yields temporally consistent depth estimates over multiple frames of a video by a dedicated architecture based on convolutional LSTM units and layer normalization.
Cascade Residual Learning: A Two-Stage Convolutional Neural Network for Stereo Matching
A novel cascade CNN architecture composing of two stages that advances the recently proposed DispNet by equipping it with extra up-convolution modules, leading to disparity images with more details, and shows that residual learning provides more effective refinement.
Group-Wise Correlation Stereo Network
Group-wise correlation provides efficient representations for measuring feature similarities and will not lose too much information like full correlation, and preserves better performance when reducing parameters compared with previous methods.
Unsupervised Learning of Stereo Matching
This paper presents a framework for learning stereo matching costs without human supervision by updating network parameters in an iterative manner and performs even comparably with other supervised methods.
Stereo Matching Using Belief Propagation
This paper forms the stereo matching problem as a Markov network consisting of three coupled Markov random fields, and obtains the maximum a posteriori (MAP) estimation in the Markovnetwork by applying a Bayesian belief propagation (BP) algorithm.
GA-Net: Guided Aggregation Net for End-To-End Stereo Matching
Two novel neural net layers, aimed at capturing local and the whole-image cost dependencies respectively are proposed, which can be used to replace the widely used 3D convolutional layer which is computationally costly and memory-consuming as it has cubic computational/memory complexity.
A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation
  • N. MayerEddy Ilg T. Brox
  • Computer Science
    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
This paper proposes three synthetic stereo video datasets with sufficient realism, variation, and size to successfully train large networks and presents a convolutional network for real-time disparity estimation that provides state-of-the-art results.