Sequential Attention: A Context-Aware Alignment Function for Machine Reading

  title={Sequential Attention: A Context-Aware Alignment Function for Machine Reading},
  author={Sebastian Brarda and Philip Yeres and Samuel R. Bowman},
In this paper we propose a neural network model with a novel Sequential Attention layer that extends soft attention by assigning weights to words in an input sequence in a way that takes into account not just how well that word matches a query, but how well surrounding words match. We evaluate this approach on the task of reading comprehension (on the Who did What and CNN datasets) and show that it dramatically improves a strong baseline—the Stanford Reader—and is competitive with the state of… 

Figures and Tables from this paper

Multihop Attention Networks for Question Answer Matching
This paper proposes Multihop Attention Networks (MAN) which use multiple vectors which focus on different parts of the question for its overall semantic representation and apply multiple steps of attention to learn representations for the candidate answers.
Hybrid Encoding-based Model for Chinese Reading Comprehension
A novel hybrid encoding-based model for the cloze-style Chinese machine reading comprehension task, which resembles the human brain reading and reasoning process, substantially outperforms the comparison models published in CMRC2017.
Read and Comprehend by Gated-Attention Reader with More Belief
In this paper, Collaborative Gating (CG) and Self-Belief Aggregation (SBA) are proposed to address the above assumptions respectively and apply self-attention to link the cloze token with other tokens in a query so that the importance of query tokens with respect to the clozes position are weighted.
On Committee Representations of Adversarial Learning Models for Question-Answer Ranking
This paper proposes a new representation procedure for this adversarial learning problem, based on committee learning, that not only is capable of consistently improving all baseline algorithms, but also outperforms the previous state-of-the-art algorithm by as much as 6% in NDCG.


Gated-Attention Readers for Text Comprehension
The model, the Gated-Attention (GA) Reader, integrates a multi-hop architecture with a novel attention mechanism, which is based on multiplicative interactions between the query embedding and the intermediate states of a recurrent neural network document reader, which enables the reader to build query-specific representations of tokens in the document for accurate answer selection.
Bidirectional Attention Flow for Machine Comprehension
The BIDAF network is introduced, a multi-stage hierarchical process that represents the context at different levels of granularity and uses bi-directional attention flow mechanism to obtain a query-aware context representation without early summarization.
A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task
A thorough examination of this new reading comprehension task by creating over a million training examples by pairing CNN and Daily Mail news articles with their summarized bullet points, and showing that a neural network can be trained to give good performance on this task.
Effective Approaches to Attention-based Neural Machine Translation
A global approach which always attends to all source words and a local one that only looks at a subset of source words at a time are examined, demonstrating the effectiveness of both approaches on the WMT translation tasks between English and German in both directions.
Teaching Machines to Read and Comprehend
A new methodology is defined that resolves this bottleneck and provides large scale supervised reading comprehension data that allows a class of attention based deep neural networks that learn to read real documents and answer complex questions with minimal prior knowledge of language structure to be developed.
Structured Attention Networks
This work shows that structured attention networks are simple extensions of the basic attention procedure, and that they allow for extending attention beyond the standard soft-selection approach, such as attending to partial segmentations or to subtrees.
Who did What: A Large-Scale Person-Centered Cloze Dataset
A new "Who-did-What" dataset of over 200,000 fill-in-the-gap (cloze) multiple choice reading comprehension problems constructed from the LDC English Gigaword newswire corpus is constructed and proposed as a challenge task for the community.
Neural Machine Translation by Jointly Learning to Align and Translate
It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Dataset and Neural Recurrent Sequence Labeling Model for Open-Domain Factoid Question Answering
This work casts neural QA as a sequence labeling problem and proposes an end-to-end sequence labeling model, which overcomes all the above challenges and outperforms the baselines significantly on WebQA.
GloVe: Global Vectors for Word Representation
A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.