Dynamic Computational Time for Visual Attention

@article{Li2017DynamicCT,
  title={Dynamic Computational Time for Visual Attention},
  author={Zhichao Li and Yang Yang and Xiao Liu and Feng Zhou and Shilei Wen and Wei Xu},
  journal={2017 IEEE International Conference on Computer Vision Workshops (ICCVW)},
  year={2017},
  pages={1199-1209}
}
  • Zhichao Li, Y. Yang, +3 authors W. Xu
  • Published 30 March 2017
  • Computer Science
  • 2017 IEEE International Conference on Computer Vision Workshops (ICCVW)
We propose a dynamic computational time model to accelerate the average processing time for recurrent visual attention (RAM). Rather than attention with a fixed number of steps for each input image, the model learns to decide when to stop on the fly. To achieve this, we add an additional continue/stop action per time step to RAM and use reinforcement learning to learn both the optimal attention policy and stopping policy. The modification is simple but could dramatically save the average… Expand
Looking Fast and Slow: Memory-Guided Mobile Video Object Detection
TLDR
This paper addresses the analogous question of whether using memory in computer vision systems can not only improve the accuracy of object detection in video streams, but also reduce the computation time by interleaving conventional feature extractors with extremely lightweight ones which only need to recognize the gist of the scene. Expand
Looking for the Devil in the Details: Learning Trilinear Attention Sampling Network for Fine-Grained Image Recognition
TLDR
TASN consists of a trilinear attention module, which generates attention maps by modeling the inter-channel relationships, an attention-based sampler which highlights attended parts with high resolution, and a feature distiller, which distills part features into an object-level feature by weight sharing and feature preserving strategies. Expand
Deep Reinforcement Learning of Region Proposal Networks for Object Detection
We propose drl-RPN, a deep reinforcement learning-based visual recognition model consisting of a sequential region proposal network (RPN) and an object detector. In contrast to typical RPNs, whereExpand
Differentiable Patch Selection for Image Recognition
TLDR
A method based on a differentiable Top-K operator to select the most relevant parts of the input to efficiently process high resolution images and shows results for traffic sign recognition, inter-patch relationship reasoning, and fine-grained recognition without using object/part bounding box annotations during training. Expand
Layer Flexible Adaptive Computational Time for Recurrent Neural Networks
TLDR
A layer flexible recurrent neural network with adaptive computation time with dynamic number of transmission states which vary by step and sequence is proposed and expanded to a sequence to sequence model. Expand
Probabilistic Adaptive Computation Time
TLDR
A probabilistic model with discrete latent variables that control the computation time in deep learning models such as ResNets and LSTMs and matches the speed-accuracy trade-off of Adaptive Computation Time, while allowing for evaluation with a simple deterministic procedure that has a lower memory footprint. Expand
Learning to Zoom: a Saliency-Based Sampling Layer for Neural Networks
TLDR
A saliency-based distortion layer for convolutional neural networks that helps to improve the spatial sampling of input data for a given task and applies to improve existing networks for the tasks of human gaze estimation and fine-grained object classification. Expand
Human Attention in Fine-grained Classification
TLDR
This work collects human gaze data for the fine-grained classification dataset CUB and builds a dataset named CUB-GHA (Gaze-based Human Attention), and proposes the Gaze Augmentation Training (GAT) and Knowledge Fusion Network (KFN) to integrate human gaze knowledge into classification models. Expand
A Probabilistic Hard Attention Model For Sequentially Observed Scenes
TLDR
This paper designs an efficient hard attention model for classifying such sequentially observed scenes and uses normalizing flows in Partial VAE to handle multi-modality in the feature-synthesis problem. Expand
Sharpen Focus: Learning With Attention Separability and Consistency
TLDR
This paper proposes a new framework that makes class-discriminative attention a principled part of the learning process and introduces new learning objectives for attention separability and cross-layer consistency, which result in improved attention discriminability and reduced visual confusion. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 86 REFERENCES
Recurrent Models of Visual Attention
TLDR
A novel recurrent neural network model that is capable of extracting information from an image or video by adaptively selecting a sequence of regions or locations and only processing the selected regions at high resolution is presented. Expand
Spatially Adaptive Computation Time for Residual Networks
TLDR
Experimental results are presented showing that this model improves the computational efficiency of Residual Networks on the challenging ImageNet classification and COCO object detection datasets and the computation time maps on the visual saliency dataset cat2000 correlate surprisingly well with human eye fixation positions. Expand
Attention for Fine-Grained Categorization
This paper presents experiments extending the work of Ba et al. (2014) on recurrent neural models for attention into less constrained visual environments, specifically fine-grained categorization onExpand
Variable Computation in Recurrent Neural Networks
TLDR
A modification to existing recurrent units is explored which allows them to learn to vary the amount of computation they perform at each step, without prior knowledge of the sequence's time structure, which leads to better performance overall on evaluation tasks. Expand
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
TLDR
An attention based model that automatically learns to describe the content of images is introduced that can be trained in a deterministic manner using standard backpropagation techniques and stochastically by maximizing a variational lower bound. Expand
End-to-End Learning of Action Detection from Frame Glimpses in Videos
TLDR
A fully end-to-end approach for action detection in videos that learns to directly predict the temporal bounds of actions and uses REINFORCE to learn the agent's decision policy. Expand
Multiple Object Recognition with Visual Attention
TLDR
The model is a deep recurrent neural network trained with reinforcement learning to attend to the most relevant regions of the input image and it is shown that the model learns to both localize and recognize multiple objects despite being given only class labels during training. Expand
Adaptive Computation Time for Recurrent Neural Networks
TLDR
Performance is dramatically improved and insight is provided into the structure of the data, with more computation allocated to harder-to-predict transitions, such as spaces between words and ends of sentences, which suggests that ACT or other adaptive computation methods could provide a generic method for inferring segment boundaries in sequence data. Expand
Deep Networks with Internal Selective Attention through Feedback Connections
TLDR
DasNet harnesses the power of sequential processing to improve classification performance, by allowing the network to iteratively focus its internal attention on some of its convolutional filters. Expand
Fully Convolutional Attention Localization Networks: Efficient Attention Localization for Fine-Grained Recognition
TLDR
It is shown that zooming in on the selected attention regions significantly improves the performance of fine-grained recognition, and the proposed approach is noticeably more computationally efficient during both training and testing because of its fully-convolutional architecture. Expand
...
1
2
3
4
5
...