Attention-based Conditioning Methods for External Knowledge Integration

  title={Attention-based Conditioning Methods for External Knowledge Integration},
  author={Katerina Margatina and Christos Baziotis and Alexandros Potamianos},
  booktitle={Annual Meeting of the Association for Computational Linguistics},
In this paper, we present a novel approach for incorporating external knowledge in Recurrent Neural Networks (RNNs). We propose the integration of lexicon features into the self-attention mechanism of RNN-based architectures. This form of conditioning on the attention distribution, enforces the contribution of the most salient words for the task at hand. We introduce three methods, namely attentional concatenation, feature-based gating and affine transformation. Experiments on six benchmark… 

Figures and Tables from this paper

Efficient Strategies for Hierarchical Text Classification: External Knowledge and Auxiliary Tasks

The combination of the auxiliary task and the additional input of class-definitions significantly enhance the classification accuracy and outperform previous studies, using a drastically reduced number of parameters, in two well-known English datasets.

KW-ATTN: Knowledge Infused Attention for Accurate and Interpretable Text Classification

It is shown that KW-ATTN outperforms baseline models using only words as well as other approaches using concepts by classification accuracy, which indicates that high-level concepts help model prediction.

Using Knowledge-Embedded Attention to Augment Pre-trained Language Models for Fine-Grained Emotion Recognition

  • Varsha SureshDesmond C. Ong
  • Computer Science
    2021 9th International Conference on Affective Computing and Intelligent Interaction (ACII)
  • 2021
This work proposes Knowledge-Embedded Attention (KEA) to use knowledge from emotion lexicons to augment the contextual representations from pre-trained ELECTRA and BERT models, which is better able to differentiate closely-confusable emotions, such as afraid and terrified.

Attention-based conditioning methods using variable frame rate for style-robust speaker verification

An entropy-based variable frame rate vector is proposed as an external conditioning vector for the self-attention layer to provide the network with information that can address style effects.

Joint Aspect Extraction and Sentiment Analysis with Directional Graph Convolutional Networks

Direction graph convolutional networks (D-GCN) are proposed to jointly perform aspect extraction and sentiment analysis with encoding syntactic information, where dependency among words are integrated in the authors' model to enhance its ability of representing input sentences and help EASA accordingly.

Less Is More: Attention Supervision with Counterfactuals for Text Classification

It is shown that human annotation cost can be kept reasonably low, while its quality can be enhanced by machine self-supervision, in text classification tasks, including sentiment analysis and news categorization.

Contextual Modulation for Relation-Level Metaphor Identification

This work introduces a novel architecture for identifying relation-level metaphoric expressions of certain grammatical relations based on contextual modulation based on conditioning the neural network computation on the deep contextualised features of the candidate expressions using feature-wise linear modulation.

Joint Chinese Word Segmentation and Part-of-speech Tagging via Two-way Attentions of Auto-analyzed Knowledge

A neural model named TwASP is proposed for joint CWS and POS tagging following the character-based sequence labeling paradigm, where a two-way attention mechanism is used to incorporate both context feature and their corresponding syntactic knowledge for each input character.

Hopper: Multi-hop Transformer for Spatiotemporal Reasoning

Hopper is proposed, which uses a Multi-hop Transformer for reasoning object permanence in videos, and can perform long-term reasoning by building a CATER-h dataset1 that requires multi-step reasoning to localize objects of interest correctly.

Named Entity Recognition for Social Media Texts with Semantic Augmentation

A neural-based approach to NER for social media texts where both local and augmented semantics are taken into account, and an attentive semantic augmentation module and a gate module to encode and aggregate such information are proposed.



Language Modeling with Gated Convolutional Networks

A finite context approach through stacked convolutions, which can be more efficient since they allow parallelization over sequential tokens, is developed and is the first time a non-recurrent approach is competitive with strong recurrent models on these large scale language tasks.

Attention is All you Need

A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

DiSAN: Directional Self-Attention Network for RNN/CNN-free Language Understanding

A novel attention mechanism in which the attention between elements from input sequence(s) is directional and multi-dimensional (i.e., feature-wise) and a light-weight neural net is proposed, based solely on the proposed attention without any RNN/CNN structure, which outperforms complicated RNN models on both prediction quality and time efficiency.

FiLM: Visual Reasoning with a General Conditioning Layer

It is shown that FiLM layers are highly effective for visual reasoning - answering image-related questions which require a multi-step, high-level process - a task which has proven difficult for standard deep learning methods that do not explicitly model reasoning.

Gated-Attention Readers for Text Comprehension

The Gated-Attention (GA) Reader, a model that integrates a multi-hop architecture with a novel attention mechanism, which is based on multiplicative interactions between the query embedding and the intermediate states of a recurrent neural network document reader, enables the reader to build query-specific representations of tokens in the document for accurate answer selection.

Robust Lexical Features for Improved Neural Network Named-Entity Recognition

This work proposes to embed words and entity types into a low-dimensional vector space the authors train from annotated data produced by distant supervision thanks to Wikipedia, and compute a feature vector representing each word that establishes a new state-of-the-art F1 score.

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

An attention based model that automatically learns to describe the content of images is introduced that can be trained in a deterministic manner using standard backpropagation techniques and stochastically by maximizing a variational lower bound.

Visual Reasoning with Multi-hop Feature Modulation

It is demonstrated that multi-hop FiLM generation significantly outperforms prior state-of-the-art on the GuessWhat?! visual dialogue task and matches state- of-the art on the ReferIt object retrieval task, and additional qualitative analysis is provided.

Employing External Rich Knowledge for Machine Comprehension

An attention-based recurrent neural network model is built, trained with the help of external knowledge which is semantically relevant to machine comprehension, and achieves a new state-of-the-art result.

NTUA-SLP at SemEval-2018 Task 3: Tracking Ironic Tweets using Ensembles of Word and Character Level Attentive RNNs

Two deep-learning systems that competed at SemEval-2018 Task 3 “Irony detection in English tweets”, based on recurrent neural networks, are presented, which operate at the word and character level, in order to capture both the semantic and syntactic information in tweets.