Probing Natural Language Inference Models through Semantic Fragments

@article{Richardson2020ProbingNL,
  title={Probing Natural Language Inference Models through Semantic Fragments},
  author={Kyle Richardson and Hai Hu and Lawrence S. Moss and Ashish Sabharwal},
  journal={ArXiv},
  year={2020},
  volume={abs/1909.07521}
}
Do state-of-the-art models for language understanding already have, or can they easily learn, abilities such as boolean coordination, quantification, conditionals, comparatives, and monotonicity reasoning (i.e., reasoning about word substitutions in sentential contexts. [...] Key Result Our experiments, using a library of 8 such semantic fragments, reveal two remarkable findings: (a) State-of-the-art models, including BERT, that are pre-trained on existing NLI benchmark datasets perform poorly on these new…Expand
Logical Inferences with Comparatives and Generalized Quantifiers
TLDR
This paper presents a compositional semantics that maps various comparative constructions in English to semantic representations via Combinatory Categorial Grammar parsers and combines it with an inference system based on automated theorem proving that outperforms previous logic-based systems as well as recent deep learning-based models. Expand
SyGNS: A Systematic Generalization Testbed Based on Natural Language Semantics
TLDR
This work proposes a Systematic Generalization testbed based on Natural language Semantics (SyGNS), whose challenge is to map natural language sentences to multiple forms of scoped meaning representations, designed to account for various semantic phenomena. Expand
Towards Coinductive Models for Natural Language Understanding. Bringing together Deep Learning and Deep Semantics
TLDR
It is argued that the known individual limitations of induction and coinduction can be overcome in empirical settings by a combination of the the two methods. Expand
Exploring Transitivity in Neural NLI Models through Veridicality
TLDR
It is found that current NLI models do not perform consistently well on transitivity inference tasks, suggesting that they lack the generalization capacity for drawing composite inferences from provided training examples. Expand
A Multilingual Benchmark for Probing Negation-Awareness with Minimal Pairs
Negation is one of the most fundamental concepts in human cognition and language, and several natural language inference (NLI) probes have been designed to investigate pretrained language models’Expand
Supporting Context Monotonicity Abstractions in Neural NLI Models
TLDR
This work reframe the problem of context monotonicity classification to make it compatible with transformer-based pre-trained NLI models and adds this task to the training pipeline, and introduces a sound and complete simplifiedmonotonicity logic formalism which describes the treatment of contexts as abstract units. Expand
Compositional and Lexical Semantics in RoBERTa, BERT and DistilBERT: A Case Study on CoQA
TLDR
This paper studies the types of linguistic phenomena accounted for by language models in the context of a Conversational Question Answering (CoQA) task and shows differences in ability to represent compositional and lexical information between RoBERTa, BERT and DistilBERT. Expand
How Context Affects Language Models' Factual Predictions
TLDR
This paper reports that augmenting pre-trained language models in this way dramatically improves performance and that the resulting system, despite being unsupervised, is competitive with a supervised machine reading baseline. Expand
Transformers as Soft Reasoners over Language
TLDR
This work trains transformers to reason (or emulate reasoning) over natural language sentences using synthetically generated data, thus bypassing a formal representation and suggesting a new role for transformers, namely as limited "soft theorem provers" operating over explicit theories in language. Expand
A Logic-Based Framework for Natural Language Inference in Dutch
We present a framework for deriving inference relations between Dutch sentence pairs. The proposed framework relies on logic-based reasoning to produce inspectable proofs leading up to inferenceExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 48 REFERENCES
Using syntactical and logical forms to evaluate textual inference competence
TLDR
This work evaluates two kinds of neural models that implicitly exploit language structure: recurrent models and the Transformer network BERT, and shows that although BERT is clearly more efficient to generalize over most logical forms, there is space for improvement when dealing with counting operators. Expand
A logical-based corpus for cross-lingual evaluation
TLDR
This work evaluates two kinds of deep learning models that implicitly exploit language structure: recurrent models and the Transformer network BERT and shows that although BERT is clearly more efficient to generalize over most logical forms, there is space for improvement when dealing with counting operators. Expand
Investigating BERT’s Knowledge of Language: Five Analysis Methods with NPIs
TLDR
It is concluded that a variety of methods is necessary to reveal all relevant aspects of a model’s grammatical knowledge in a given domain. Expand
A large annotated corpus for learning natural language inference
TLDR
The Stanford Natural Language Inference corpus is introduced, a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning, which allows a neural network-based model to perform competitively on natural language inference benchmarks for the first time. Expand
Enhanced LSTM for Natural Language Inference
TLDR
A new state-of-the-art result is presented, achieving the accuracy of 88.6% on the Stanford Natural Language Inference Dataset, and it is demonstrated that carefully designing sequential inference models based on chain LSTMs can outperform all previous models. Expand
Annotation Artifacts in Natural Language Inference Data
TLDR
It is shown that a simple text categorization model can correctly classify the hypothesis alone in about 67% of SNLI and 53% of MultiNLI, and that specific linguistic phenomena such as negation and vagueness are highly correlated with certain inference classes. Expand
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference
TLDR
There is substantial room for improvement in NLI systems, and the HANS dataset can motivate and measure progress in this area, which contains many examples where the heuristics fail. Expand
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
TLDR
A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks. Expand
MonaLog: a Lightweight System for Natural Language Inference Based on Monotonicity
TLDR
It is shown that MonaLog is capable of generating large amounts of high-quality training data for BERT, improving its accuracy on SICK and used in combination with the current state-of-the-art model BERT in a variety of settings, including for compositional data augmentation. Expand
Stress Test Evaluation for Natural Language Inference
TLDR
This work proposes an evaluation methodology consisting of automatically constructed “stress tests” that allow us to examine whether systems have the ability to make real inferential decisions, and reveals strengths and weaknesses of these models with respect to challenging linguistic phenomena. Expand
...
1
2
3
4
5
...