TDAF: Top-Down Attention Framework for Vision Tasks

@inproceedings{Pang2021TDAFTA,
  title={TDAF: Top-Down Attention Framework for Vision Tasks},
  author={Bo Pang and Yizhuo Li and Jiefeng Li and Muchen Li and Hanwen Cao and Cewu Lu},
  booktitle={AAAI},
  year={2021}
}
Human attention mechanisms often work in a top-down manner, yet it is not well explored in vision research. Here, we propose the Top-Down Attention Framework (TDAF) to capture top-down attentions, which can be easily adopted in most existing models. The designed Recursive Dual-Directional Nested Structure in it forms two sets of orthogonal paths, recursive and structural ones, where bottom-up spatial features and top-down attention features are extracted respectively. Such spatial and attention… 

Figures and Tables from this paper

Unsupervised Representation for Semantic Segmentation by Implicit Cycle-Attention Contrastive Learning

TLDR
The cycle-attention contrastive learning (CACL) makes use of semantic continuity of video frames, adopting unsupervised cycle-consistent attention mechanism to implicitly conduct contrastivelearning with difficult, global-local-balanced positive pixel pairs.

PGT: A Progressive Method for Training Models on Long Videos

TLDR
This work proposes to treat videos as serial fragments satisfying Markov property, and train it as a whole by progressively propagating information through the temporal dimension in multiple steps, able to train long videos end-to-end with limited resources and ensures the effective transmission of information.

References

SHOWING 1-10 OF 69 REFERENCES

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

TLDR
A combined bottom-up and top-down attention mechanism that enables attention to be calculated at the level of objects and other salient image regions is proposed, demonstrating the broad applicability of this approach to VQA.

A2-Nets: Double Attention Networks

TLDR
This work proposes the "double attention block", a novel component that aggregates and propagates informative global features from the entire spatio-temporal space of input images/videos, enabling subsequent convolution layers to access featuresFrom the entire space efficiently.

Stand-Alone Self-Attention in Vision Models

TLDR
The results establish that stand-alone self-attention is an important addition to the vision practitioner's toolbox and is especially impactful when used in later layers.

Residual Attention Network for Image Classification

TLDR
The proposed Residual Attention Network is a convolutional neural network using attention mechanism which can incorporate with state-of-art feed forward network architecture in an end-to-end training fashion and can be easily scaled up to hundreds of layers.

Deep RNN Framework for Visual Sequential Applications

TLDR
This work proposes a new recurrent neural framework that can be stacked deep effectively and provides empirical evidence to show that the deep RNN framework is easy to optimize and can gain accuracy from the increased depth on several visual sequence problems.

Global-and-local attention networks for visual recognition

TLDR
This work extends the SE module with a novel global-and-local attention (GALA) module which combines both forms of attention -- resulting in state-of-the-art accuracy on ILSVRC.

Learning what and where to attend

TLDR
A state-of-the-art attention network is extended and it is demonstrated that adding ClickMe supervision significantly improves its accuracy and yields visual features that are more interpretable and more similar to those used by human observers.

An Empirical Study of Spatial Attention Mechanisms in Deep Networks

TLDR
An empirical study that ablates various spatial attention elements within a generalized attention formulation, encompassing the dominant Transformer attention as well as the prevalent deformable convolution and dynamic convolution modules, yields significant findings about spatial attention in deep networks, some of which run counter to conventional understanding.

Attention Augmented Convolutional Networks

TLDR
It is found that Attention Augmentation leads to consistent improvements in image classification on ImageNet and object detection on COCO across many different models and scales, including ResNets and a state-of-the art mobile constrained network, while keeping the number of parameters similar.

Attention Branch Network: Learning of Attention Mechanism for Visual Explanation

TLDR
Attention Branch Network (ABN) is proposed, which extends a response-based visual explanation model by introducing a branch structure with an attention mechanism and is trainable for visual explanation and image recognition in an end-to-end manner.
...