Enhancing Generalization in Natural Language Inference by Syntax

  title={Enhancing Generalization in Natural Language Inference by Syntax},
  author={Qi He and Han Wang and Yue Zhang},
Pre-trained language models such as BERT have achieved the state-of-the-art performance on natural language inference (NLI). However, it has been shown that such models can be tricked by variations of surface patterns such as syntax. We investigate the use of dependency trees to enhance the generalization of BERT in the NLI task, leveraging on a graph convolutional network to represent a syntax-based matching graph with heterogeneous matching patterns. Experimental results show that, our syntax… 

Figures and Tables from this paper

The Limitations of Limited Context for Constituency Parsing
This work ground this question in the sandbox of probabilistic context-free-grammars (PCFGs), and identifies a key aspect of the representational power of these approaches: the amount and directionality of context that the predictor has access to when forced to make parsing decision.
BertGCN: Transductive Text Classification by Combining GNN and BERT
By jointly training the BERT and GCN modules within BertGCN, the proposed model is able to leverage the advantages of both worlds: large-scale pretraining which takes the advantage of the massive amount of raw data and transductive learning.


Syntax-Aware Sentence Matching with Graph Convolutional Networks
A new method which incorporates syntactic structure into “matching-aggregation” framework for sentence matching tasks and uses a gating mechanism to dynamically combine the raw contextual representation of a sentence with the syntactic representation of the sentence to relieve the noise caused by the potential wrong dependency parsing result.
Enhanced LSTM for Natural Language Inference
This paper presents a new state-of-the-art result, achieving the accuracy of 88.6% on the Stanford Natural Language Inference Dataset, and demonstrates that carefully designing sequential inference models based on chain LSTMs can outperform all previous models.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Improving Natural Language Inference with a Pretrained Parser
A novel approach to incorporate syntax into natural language inference (NLI) models using contextual token-level vector representations from a pretrained dependency parser that is broadly applicable to any neural model.
A large annotated corpus for learning natural language inference
The Stanford Natural Language Inference corpus is introduced, a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning, which allows a neural network-based model to perform competitively on natural language inference benchmarks for the first time.
Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling
A version of graph convolutional networks (GCNs), a recent class of neural networks operating on graphs, suited to model syntactic dependency graphs, is proposed, observing that GCN layers are complementary to LSTM ones.
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks.
A Decomposable Attention Model for Natural Language Inference
We propose a simple neural architecture for natural language inference. Our approach uses attention to decompose the problem into subproblems that can be solved separately, thus making it trivially
Syntax Helps ELMo Understand Semantics: Is Syntax Still Relevant in a Deep Neural Architecture for SRL?
Though ELMo out-performs typical word embeddings, beginning to close the gap in F1 between LISA with predicted and gold syntactic parses, syntactically-informed models still out-Perform syntax-free models when both use ELMo, especially on out-of-domain data.
Can Syntax Help? Improving an LSTM-based Sentence Compression Model for New Domains
This paper proposes two major changes to the LSTM neural network model for sentence compression: using explicit syntactic features and introducing syntactic constraints through Integer Linear Programming (ILP).