CATs: Cost Aggregation Transformers for Visual Correspondence
@inproceedings{Cho2021CATsCA, title={CATs: Cost Aggregation Transformers for Visual Correspondence}, author={Seokju Cho and Sunghwan Hong and Sangryul Jeon and Yunsung Lee and Kwanghoon Sohn and Seungryong Kim}, booktitle={NeurIPS}, year={2021} }
We propose a novel cost aggregation network, called Cost Aggregation Transformers (CATs), to find dense correspondences between semantically similar images with additional challenges posed by large intra-class appearance and geometric variations. Cost aggregation is a highly important process in matching tasks, which the matching accuracy depends on the quality of its output. Compared to handcrafted or CNN-based methods addressing the cost aggregation, in that either lacks robustness to severe…
Figures and Tables from this paper
10 Citations
CATs++: Boosting Cost Aggregation with Convolutions and Transformers
- Computer ScienceArXiv
- 2022
The proposed CATs++, an extension of CATs, introduces early convolutions prior to cost aggregation with a transformer to control the number of tokens as well as to inject some convolutional inductive bias, and proposes a novel transformer architecture for both efficient and effective cost aggregation, which results in apparent performance boost and cost reduction.
Cost Aggregation Is All You Need for Few-Shot Segmentation
- Computer ScienceArXiv
- 2021
We introduce a novel cost aggregation network, dubbed Volumetric Aggregation with Transformers (VAT), to tackle the few-shot segmentation task by using both convolutions and transformers to…
Cost Aggregation with 4D Convolutional Swin Transformer for Few-Shot Segmentation
- Computer ScienceArXiv
- 2022
A novel cost aggregation network, called Volumetric Aggregation with Transformers (VAT), for few-shot segmentation, where a high-dimensional Swin Transformer is preceded by a series of small-kernel convolutions that impart local context to all pixels and introduce convolutional inductive bias.
GAN-Supervised Dense Visual Alignment
- Computer ScienceArXiv
- 2021
GANgealing significantly outperforms past self-supervised correspondence algorithms and performs on-par with state-of-the-art supervised correspondence algorithms on several datasets—without making use of any correspondence supervision or data augmentation and despite being trained exclusively on GAN-generated data.
AiATrack: Attention in Attention for Transformer Visual Tracking
- Computer ScienceArXiv
- 2022
This work proposes an attention in attention (AiA) module, which enhances appropriate correlations and suppresses erroneous ones by seeking consensus among all correlation vectors, and proposes a streamlined Transformer tracking framework, dubbed AiATrack, by introducing efficient feature reuse and target-background embeddings to make full use of temporal references.
HMFS: Hybrid Masking for Few-Shot Segmentation
- Computer ScienceArXiv
- 2022
This work compensates for the loss of fine-grained spatial details in FM technique by investigat-ing and leveraging a complementary basic input masking method, which shows improved performance against the current state-of-the-art methods by visible margins across different benchmarks.
Rewriting geometric rules of a GAN
- Computer ScienceACM Transactions on Graphics
- 2022
This work enables a user to "warp" a given model by editing just a handful of original model outputs with desired geometric changes, enabling the creation of a new generative model without the burden of curating a large-scale dataset.
FlowFormer: A Transformer Architecture for Optical Flow
- Computer ScienceArXiv
- 2022
. We introduce Optical Flow TransFormer (FlowFormer), a transformer-based neural network architecture for learning optical flow. FlowFormer tokenizes the 4D cost volume built from an image pair,…
Demystifying Unsupervised Semantic Correspondence Estimation
- Computer ScienceArXiv
- 2022
A new unsupervised correspondence approach is introduced which utilizes the strength of pre-trained features while encouraging better matches during training, which results in significantly better matching performance compared to current state-of-the-art methods.
Examining Responsibility and Deliberation in AI Impact Statements and Ethics Reviews
- PsychologyAIES
- 2022
The artificial intelligence research community is continuing to grapple with the ethics of its work by encouraging researchers to discuss potential positive and negative consequences. Neural…
References
SHOWING 1-10 OF 66 REFERENCES
Universal Correspondence Network
- Computer ScienceNIPS
- 2016
A convolutional spatial transformer to mimic patch normalization in traditional features like SIFT is proposed, which is shown to dramatically boost accuracy for semantic correspondences across intra-class shape variations.
Correspondence Networks With Adaptive Neighbourhood Consensus
- Computer Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
This paper proposes a convolutional neural network architecture, called adaptive neighbourhood consensus network (ANC-Net), that can be trained end-to-end with sparse key-point annotations, to handle the task of establishing dense visual correspondences between images containing objects of the same category.
SFNet: Learning Object-Aware Semantic Correspondence
- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
A new CNN architecture is proposed, dubbed SFNet, which leverages a new and differentiable version of the argmax function for end-to-end training, with a loss that combines mask and flow consistency with smoothness terms.
Learning to Compose Hypercolumns for Visual Correspondence
- Computer ScienceECCV
- 2020
A novel approach to visual correspondence that dynamically composes effective features by leveraging relevant layers conditioned on the images to match by selecting a small number of relevant layers from a deep convolutional neural network is introduced.
PARN: Pyramidal Affine Regression Networks for Dense Semantic Correspondence
- Computer ScienceECCV
- 2018
A deep architecture for dense semantic correspondence, called pyramidal affine regression networks (PARN), that estimates locally-varying affine transformation fields across images and proposes a novel weakly-supervised training scheme that generates progressive supervisions by leveraging a correspondence consistency across image pairs.
FCSS: Fully Convolutional Self-Similarity for Dense Semantic Correspondence
- Computer Science2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
To robustly match points among different instances within the same object class, FCSS is formulated using local self-similarity (LSS) within a fully convolutional network, which is inherently insensitive to intra-class appearance variations because of its LSS-based structure.
Dynamic Context Correspondence Network for Semantic Alignment
- Computer Science2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019
This paper proposes a context-aware semantic representation that incorporates spatial layout for robust matching against local ambiguities and develops a novel dynamic fusion strategy based on attention mechanism to weave the advantages of both local and context features by integrating semantic cues from multiple scales.
FCSS: Fully Convolutional Self-Similarity for Dense Semantic Correspondence
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2019
This work proposes to leverage object candidate priors provided in most existing datasets and also correspondence consistency between object pairs to enable weakly-supervised learning and significantly outperforms conventional handcrafted descriptors and CNN-based descriptors on various benchmarks.
Neighbourhood Consensus Networks
- Computer ScienceNeurIPS
- 2018
An end-to-end trainable convolutional neural network architecture that identifies sets of spatially consistent matches by analyzing neighbourhood consensus patterns in the 4D space of all possible correspondences between a pair of images without the need for a global geometric model is developed.
Efficient Neighbourhood Consensus Networks via Submanifold Sparse Convolutions
- Computer ScienceECCV
- 2020
The proposed Sparse-NCNet method obtains state-of-the-art results on the HPatches Sequences and InLoc visual localisation benchmarks, and competitive results in the Aachen Day-Night benchmark.