Learned Multi-patch Similarity

@article{Hartmann2017LearnedMS,
  title={Learned Multi-patch Similarity},
  author={Wilfried Hartmann and S. Galliani and Michal Havlena and Luc Van Gool and Konrad Schindler},
  journal={2017 IEEE International Conference on Computer Vision (ICCV)},
  year={2017},
  pages={1595-1603}
}
Estimating a depth map from multiple views of a scene is a fundamental task in computer vision. [] Key Result Experiments on several multi-view datasets demonstrate that this approach has advantages over methods based on pairwise patch similarity.

Figures and Tables from this paper

Learning Descriptor, Confidence, and Depth Estimation in Multi-view Stereo
TLDR
This paper proposes a machine learning technique based on deep convolutional neural networks for multi-view stereo matching and presents the confidence estimation network for incorporating the cost volumes along the depth hypothesis in multiview stereo.
A Global-Matching Framework for Multi-View Stereopsis
TLDR
A robust framework that is trained on high-resolution stereo images directly and capable of learning global information and enforcing smoothness constraints across the whole image is presented and results comparable to existing state-of-the-art approaches are generated.
A robust framework for multi-view stereopsis
TLDR
A compelling matching network learning comprehensive information from stereo images is constructed to enforce smoothness constraints globally and it is shown that the network can directly handle the DTU multi-view stereo dataset.
Multi-View Optimization of Local Feature Geometry
TLDR
This work addresses the problem of refining the geometry of local image features from multiple views without known scene or camera geometry by first estimate local geometric transformations between tentative matches and then optimize the keypoint locations over multiple views jointly according to a non-linear least squares formulation.
Learning a Multi-View Stereo Machine
TLDR
End-to-end learning allows us to jointly reason about shape priors while conforming geometric constraints, enabling reconstruction from much fewer images than required by classical approaches as well as completion of unseen surfaces.
Learning a MultiView Stereo Machine
TLDR
End-to-end learning allows us to jointly reason about shape priors while conforming geometric constraints, enabling reconstruction from much fewer images than required by classical approaches as well as completion of unseen surfaces.
A Survey on Deep Learning Techniques for Stereo-Based Depth Estimation
TLDR
A comprehensive survey of this new and continuously growing field of research, summarize the most commonly used pipelines, and discuss their benefits and limitations is provided.
Volume Sweeping: Learning Photoconsistency for Multi-View Shape Reconstruction
TLDR
It is shown that learning based on a volumetric receptive field around a 3D depth candidate improves over using per-view 2D windows, giving the photoconsistency inference more visibility over local 3D correlations in viewpoint color aggregation.
A Novel Recurrent Encoder-Decoder Structure for Large-Scale Multi-View Stereo Reconstruction From an Open Aerial Dataset
  • Jin Liu, Shunping Ji
  • Computer Science
    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
TLDR
A novel network, called RED-Net, for wide-range depth inference, which was developed from a recurrent encoder-decoder structure to regularize cost maps across depths and a 2D fully convolutional network as framework as framework, and it is proved that the RED- net model pre-trained on the synthetic WHU dataset can be efficiently transferred to very different multi-view aerial image datasets without any fine-tuning.
Learning to Adapt Multi-View Stereo by Self-Supervision
TLDR
This work uses model-agnostic meta-learning (MAML) to train base parameters which are adapted for multi-view stereo on new domains through self-supervised training which trains a deep neural network for improved adaptability to new target domains.
...
...

References

SHOWING 1-10 OF 31 REFERENCES
Learning to compare image patches via convolutional neural networks
TLDR
This paper shows how to learn directly from image data a general similarity function for comparing image patches, which is a task of fundamental importance for many computer vision problems, and opts for a CNN-based model that is trained to account for a wide variety of changes in image appearance.
Just Look at the Image: Viewpoint-Specific Surface Normal Prediction for Improved Multi-View Reconstruction
  • S. Galliani, K. Schindler
  • Computer Science
    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
We present a multi-view reconstruction method that combines conventional multi-view stereo (MVS) with appearance-based normal prediction, to obtain dense and accurate 3D surface models. Reliable
Discriminative Learning of Deep Convolutional Feature Point Descriptors
TLDR
This paper uses Convolutional Neural Networks to learn discriminant patch representations and in particular train a Siamese network with pairs of (non-)corresponding patches to develop 128-D descriptors whose euclidean distances reflect patch similarity and can be used as a drop-in replacement for any task involving SIFT.
Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches
TLDR
This work presents a method for extracting depth information from a rectified image pair by learning a similarity measure on small image patches using a convolutional neural network and examines two network architectures for this task: one tuned for speed, the other for accuracy.
Computing the stereo matching cost with a convolutional neural network
  • J. Zbontar, Yann LeCun
  • Computer Science
    2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2015
TLDR
This work trains a convolutional neural network to predict how well two image patches match and uses it to compute the stereo matching cost, which achieves an error rate of 2.61% on the KITTI stereo dataset.
MatchNet: Unifying feature and metric learning for patch-based matching
TLDR
A unified approach to combining feature computation and similarity networks for training a patch matching system that improves accuracy over previous state-of-the-art results on patch matching datasets, while reducing the storage requirement for descriptors is confirmed.
Massively Parallel Multiview Stereopsis by Surface Normal Diffusion
TLDR
This work builds on the Patchmatch idea: starting from randomly generated 3D planes in scene space, the best-fitting planes are iteratively propagated and refined to obtain a 3D depth and normal field per view, such that a robust photo-consistency measure over all images is maximized.
A space-sweep approach to true multi-image matching
  • R. Collins
  • Mathematics
    Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition
  • 1996
TLDR
A new space-sweep approach to true multi-image matching is presented that simultaneously determines 2D feature correspondences and the 3D positions of feature points in the scene.
A Multi-view Stereo Benchmark with High-Resolution Images and Multi-camera Videos
TLDR
This benchmark is the first to cover the important use case of hand-held mobile devices while also providing high-resolution DSLR camera images and provides data at significantly higher temporal and spatial resolution.
Computational Stereo
TLDR
The criteria that are important for evaluating the effectiveness of various computational stereo techniques are presented and a representative sampling of computational stereo research is provided.
...
...