• Corpus ID: 235670093

Probabilistic Attention for Interactive Segmentation

  title={Probabilistic Attention for Interactive Segmentation},
  author={Prasad Gabbur and Manjot Bilkhu and Javier R. Movellan},
  booktitle={Neural Information Processing Systems},
We provide a probabilistic interpretation of attention and show that the standard dotproduct attention in transformers is a special case of Maximum A Posteriori (MAP) inference. The proposed approach suggests the use of Expectation Maximization algorithms for online adaptation of key and value model parameters. This approach is useful for cases in which external agents, e.g., annotators, provide inference-time information about the correct values of some tokens, e.g., the semantic category of… 

Figures from this paper

Transformer with Fourier Integral Attentions

This work proposes the FourierFormer, a new class of transformers in which the dot-product kernels are replaced by the novel generalized Fourier integral kernels, and empirically corroborates the advantages of FourierFormers over the baseline transformers over a variety of practical applications including language modeling and image classification.

FourierFormer: Transformer Meets Generalized Fourier Integral Theorem

The FourierFormer is proposed, a new class of transformers in which the dot-product kernels are replaced by the novel generalized Fourier integral kernels, and the advantages of FourierFormers over the baseline transformers are empirically corroborated in a variety of practical applications.


A principled framework for constructing attention layers in transformers is provided and it is shown that the self-attention corresponds to the support vector expansion derived from a support vector regression problem, whose primal formulation has the form of a neural network layer.

Robustify Transformers with Robust Kernel Density Estimation

This work leverages the robust kernel density estimation (RKDE) in the self-attention mechanism, to alleviate the issue of the contamination of data by down-weighting the weight of bad samples in the estimation process.



Latent Alignment and Variational Attention

Variational attention networks are considered, alternatives to soft and hard attention for learning latent variable alignment models, with tighter approximation bounds based on amortized variational inference, and methods for reducing the variance of gradients are proposed to make these approaches computationally feasible.

Content-Aware Multi-Level Guidance for Interactive Instance Segmentation

This work proposes a novel transformation of user clicks to generate content-aware guidance maps that leverage the hierarchical structural information present in an image to outperform existing approaches that require state-of-the-art segmentation networks pre-trained on large scale segmentation datasets.

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

An attention based model that automatically learns to describe the content of images is introduced that can be trained in a deterministic manner using standard backpropagation techniques and stochastically by maximizing a variational lower bound.

Fully convolutional networks for semantic segmentation

The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.

CCNet: Criss-Cross Attention for Semantic Segmentation

  • Zilong HuangXinggang Wang Wenyu Liu
  • Computer Science
    2019 IEEE/CVF International Conference on Computer Vision (ICCV)
  • 2019
This work proposes a Criss-Cross Network (CCNet) for obtaining contextual information in a more effective and efficient way and achieves the mIoU score of 81.4 and 45.22 on Cityscapes test set and ADE20K validation set, respectively, which are the new state-of-the-art results.

F-BRS: Rethinking Backpropagating Refinement for Interactive Segmentation

Deep neural networks have become a mainstream approach to interactive segmentation. As we show in our experiments, while for some images a trained network provides accurate segmentation result with

Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN++

This work follows the idea of Polygon-RNN to produce polygonal annotations of objects interactively using humans-in-the-loop and achieves a high reduction in annotation time for new datasets, moving a step closer towards an interactive annotation tool to be used in practice.

Deep Interactive Object Selection

This paper presents a novel deep-learning-based algorithm which has much better understanding of objectness and can reduce user interactions to just a few clicks and is superior to all existing interactive object selection approaches.

Interactive Full Image Segmentation by Considering All Regions Jointly

This work proposes an interactive, scribble-based annotation framework which operates on the whole image to produce segmentations for all regions, and adapt Mask-RCNN into a fast interactive segmentation framework and introduces an instance-aware loss measured at the pixel-level in the full image canvas, which lets predictions for nearby regions properly compete for space.

Self-Attention with Relative Position Representations

This work presents an alternative approach, extending the self-attention mechanism to efficiently consider representations of the relative positions, or distances between sequence elements, on the WMT 2014 English-to-German and English- to-French translation tasks.