Contextualized Non-local Neural Networks for Sequence Learning

  title={Contextualized Non-local Neural Networks for Sequence Learning},
  author={Pengfei Liu and Shuaichen Chang and Xuanjing Huang and Jian Tang and Jackie Chi Kit Cheung},
  booktitle={AAAI Conference on Artificial Intelligence},
Recently, a large number of neural mechanisms and models have been proposed for sequence learning, of which selfattention, as exemplified by the Transformer model, and graph neural networks (GNNs) have attracted much attention. In this paper, we propose an approach that combines and draws on the complementary strengths of these two methods. Specifically, we propose contextualized non-local neural networks (CN3), which can both dynamically construct a task-specific structure of a sentence and… 

Figures and Tables from this paper

Neural Extractive Summarization with Hierarchical Attentive Heterogeneous Graph Network

This paper proposes HAHSum (as shorthand for Hierarchical Attentive Heterogeneous Graph for Text Summarization), which well models different levels of information, including words and sentences, and spotlights redundancy dependencies between sentences.

Graph Contextualized Self-Attention Network for Session-based Recommendation

A graph contextualized self-attention model (GC-SAN) is proposed, which utilizes both graph neural network and self-Attention mechanism, for session-based recommendation and outperforms state-of-the-art methods consistently.

Reinforcement Learning Based Graph-to-Sequence Model for Natural Question Generation

This model consists of a Graph2Seq generator with a novel Bidirectional Gated Graph Neural Network based encoder to embed the passage, and a hybrid evaluator with a mixed objective combining both cross-entropy and RL losses to ensure the generation of syntactically and semantically valid text.

Deep Iterative and Adaptive Learning for Graph Neural Networks

An end-to-end graph learning framework, namely Deep Iterative and Adaptive Learning for Graph Neural Networks (DIAL-GNN), for jointly learning the graph structure and graph embeddings simultaneously and a novel iterative method for searching for a hidden graph structure that augments the initial graph structure is proposed.

Parallel Connected LSTM for Matrix Sequence Prediction with Elusive Correlations

This article proposes a novel architecture called Parallel Connected LSTM (PcLSTM), which integrates two new mechanisms, Multi-channel Linearized Connection (McLC) and Adaptive Parallel Unit (APU), into the framework of L STM, and is able to handle well both the elusive correlations within each timestamp and the temporal dependencies across different timestamps.


This model consists of a Graph2Seq generator with a novel Bidirectional Gated Graph Neural Network based encoder to embed the passage, and a hybrid evaluator with a mixed objective combining both cross-entropy and RL losses to ensure the generation of syntactically and semantically valid text.

Hierarchical Contextualized Representation for Named Entity Recognition

This paper proposes a model augmented with hierarchical contextualized representation: sentence- level representation and document-level representation that takes different contributions of words in a single sentence into consideration to enhance the sentence representation learned from an independent BiLSTM via label embedding attention mechanism.

Deep Unsupervised Active Learning on Learnable Graphs

A novel deep unsupervised active learning model via learnable graphs, named ALLGs, which benefits from learning optimal graph structures to acquire better sample representation and select representative samples and incorporates shortcut connections among different layers.


This work proposes a novel graph neural network (GNN) based model, namely GRAPHFLOW, which captures conversational flow in the dialog, and presents a novel flow mechanism to model the temporal dependencies in the sequence of context graphs.

Using Cognitive Interest Graph and Knowledge-activated Attention for Learning Resource Recommendation

A new method for learning session-based recommendation which uses a graph neural network (GNNs) and a cognitive interest graph that uses a novel knowledge-activated attention mechanism that makes uses of the knowledge mastery of learners and their past behavior to actively adapt to their learning interests.



Deep Fusion LSTMs for Text Semantic Matching

This paper proposes a model of deep fusion LSTMs (DF-LSTMs) to model the strong interaction of text pair in a recursive matching way and uses external memory to increase the capacity of LSTm, thereby possibly capturing more complicated matching patterns.

A Convolutional Neural Network for Modelling Sentences

A convolutional architecture dubbed the Dynamic Convolutional Neural Network (DCNN) is described that is adopted for the semantic modelling of sentences and induces a feature graph over the sentence that is capable of explicitly capturing short and long-range relations.

A unified architecture for natural language processing: deep neural networks with multitask learning

We describe a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic

Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling

A version of graph convolutional networks (GCNs), a recent class of neural networks operating on graphs, suited to model syntactic dependency graphs, is proposed, observing that GCN layers are complementary to LSTM ones.

Multi-Task Cross-Lingual Sequence Tagging from Scratch

A deep hierarchical recurrent neural network for sequence tagging that employs deep gated recurrent units on both character and word levels to encode morphology and context information, and applies a conditional random field layer to predict the tags.

Deep Contextualized Word Representations

A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals.

Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks

The Tree-LSTM is introduced, a generalization of LSTMs to tree-structured network topologies that outperform all existing systems and strong LSTM baselines on two tasks: predicting the semantic relatedness of two sentences and sentiment classification.

Dynamic Compositional Neural Networks over Tree Structure

This paper introduces the dynamic compositional neural networks over tree structure (DC-TreeNN), in which the compositional function is dynamically generated by a meta network, which captures the metaknowledge across the different compositional rules and formulate them.

Constituency Parsing with a Self-Attentive Encoder

It is demonstrated that replacing an LSTM encoder with a self-attentive architecture can lead to improvements to a state-of-the-art discriminative constituency parser, and it is found that separating positional and content information in the encoder canlead to improved parsing accuracy.

Graph Attention Networks

We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior