Deep Matching Prior: Test-Time Optimization for Dense Correspondence

@article{Hong2021DeepMP,
  title={Deep Matching Prior: Test-Time Optimization for Dense Correspondence},
  author={Sunghwan Hong and Seungryong Kim},
  journal={2021 IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2021},
  pages={9887-9897}
}
Conventional techniques to establish dense correspondences across visually or semantically similar images focused on designing a task-specific matching prior, which is difficult to model in general. To overcome this, recent learning-based methods have attempted to learn a good matching prior within a model itself on large training data. The performance improvement was apparent, but the need for sufficient training data and intensive learning hinders their applicability. Moreover, using the… 
CATs++: Boosting Cost Aggregation with Convolutions and Transformers
TLDR
The proposed CATs++, an extension of CATs, introduces early convolutions prior to cost aggregation with a transformer to control the number of tokens as well as to inject some convolutional inductive bias, and proposes a novel transformer architecture for both efficient and effective cost aggregation, which results in apparent performance boost and cost reduction.
CATs: Cost Aggregation Transformers for Visual Correspondence
TLDR
A novel cost aggregation network, called Cost Aggregation Transformers (CATs), to find dense correspondences between semantically similar images with additional challenges posed by large intra-class appearance and geometric variations, and proposes multi-level aggregation to efficiently capture different semantics from hierarchical feature representations.
Cost Aggregation Is All You Need for Few-Shot Segmentation
We introduce a novel cost aggregation network, dubbed Volumetric Aggregation with Transformers (VAT), to tackle the few-shot segmentation task by using both convolutions and transformers to

References

SHOWING 1-10 OF 77 REFERENCES
A Multi-view Stereo Benchmark with High-Resolution Images and Multi-camera Videos
TLDR
This benchmark is the first to cover the important use case of hand-held mobile devices while also providing high-resolution DSLR camera images and provides data at significantly higher temporal and spatial resolution.
HPatches: A Benchmark and Evaluation of Handcrafted and Learned Local Descriptors
TLDR
A novel benchmark for evaluating local image descriptors is proposed and it is shown that a simple normalisation of traditional hand-crafted descriptors can boost their performance to the level of deep learning based descriptors within a realistic benchmarks evaluation.
Proposal Flow
TLDR
This work introduces a novel approach to semantic flow, dubbed proposal flow, that establishes reliable correspondences using object proposals, and demonstrates that proposal flow significantly outperforms existing semantic flow methods in various settings.
GLU-Net: Global-Local Universal Network for Dense Flow and Correspondences
TLDR
This work proposes a universal network architecture that is directly applicable to all the aforementioned dense correspondence problems, and achieves both high accuracy and robustness to large displacements by investigating the combined use of global and local correlation layers.
GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network
TLDR
The proposed GOCor module is a fully differentiable dense matching module, acting as a direct replacement to the feature correlation layer, capable of effectively learning spatial matching priors to resolve further matching ambiguities.
RANSAC-Flow: generic two-stage image alignment
TLDR
This paper considers the generic problem of dense alignment between two images and proposes a two-stage process: first, a feature-based parametric coarse alignment using one or more homographies, followed by non-parametric fine pixel-wise alignment.
DGC-Net: Dense Geometric Correspondence Network
TLDR
This paper proposes a coarse-to-fine CNN-based framework that can leverage the advantages of optical flow approaches and extend them to the case of large transformations providing dense and subpixel accurate estimates and proves that the model outperforms existing dense approaches.
Self-Supervised Learning of Depth and Motion Under Photometric Inconsistency
TLDR
This paper addresses the issue when previous assumptions of the self-supervised approaches are violated due to the dynamic nature of real-world scenes, and substantially improves the state-of-the-art methods on both depth and relative pose estimation for monocular image sequences, without adding inference overhead.
Volumetric Correspondence Networks for Optical Flow
TLDR
Several simple modifications that dramatically simplify the use of volumetric layers are introduced that significantly improve accuracy over SOTA on standard benchmarks while being significantly easier to work with - training converges in 10X fewer iterations, and most importantly, the networks generalize across correspondence tasks.
Neighbourhood Consensus Networks
TLDR
An end-to-end trainable convolutional neural network architecture that identifies sets of spatially consistent matches by analyzing neighbourhood consensus patterns in the 4D space of all possible correspondences between a pair of images without the need for a global geometric model is developed.
...
1
2
3
4
5
...