Breakpoint Transformers for Modeling and Tracking Intermediate Beliefs

  title={Breakpoint Transformers for Modeling and Tracking Intermediate Beliefs},
  author={Kyle Richardson and Ronen Tamari and Oren Sultan and Reut Tsarfaty and Dafna Shahaf and Ashish Sabharwal},
Can we teach natural language understanding models to track their beliefs through intermediate points in text? We propose a representation learning framework called breakpoint modeling that allows for learning of this type. Given any text encoder and data marked with intermediate states ( breakpoints ) along with corresponding textual queries viewed as true/-false propositions (i.e., the candidate beliefs of a model, consisting of information changing through time) our approach trains models in… 



Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

A Logic-Driven Framework for Consistency of Neural Models

This paper proposes a learning framework for constraining models using logic rules to regularize them away from inconsistency, and instantiate it on natural language inference, where experiments show that enforcing invariants stated in logic can help make the predictions of neural models both accurate and consistent.

Dyna-bAbI: unlocking bAbI’s potential with dynamic synthetic benchmarking

Dyna-bAbI is developed, a dynamic framework providing fine-grained control over task generation in bAbI, underscoring the importance of highly controllable task generators for creating robust NLU systems through a virtuous cycle of model and data development.

Tiered Reasoning for Intuitive Physics: Toward Verifiable Commonsense Language Understanding

The empirical results show that while large LMs can achieve high end performance, they struggle to support their predictions with valid supporting evidence, and this paper introduces Tiered Reasoning for Intuitive Physics ( TRIP), a novel commonsense reasoning dataset with dense annotations that enable multi-tiered evaluation of machines’ reasoning process.

Implicit Representations of Meaning in Neural Language Models

The results indicate that prediction in pretrained neural language models is supported, at least in part, by dynamic representations of meaning and implicit simulation of entity state, and that this behavior can be learned with only text as training data.

Measuring Systematic Generalization in Neural Proof Generation with Transformers

It is observed that models that are not trained to generate proofs are better at generalizing to problems based on longer proofs, which suggests that Transformers have efficient internal reasoning strategies that are harder to interpret.

CLUTRR: A Diagnostic Benchmark for Inductive Reasoning from Text

A diagnostic benchmark suite, named CLUTRR, is introduced to clarify some key issues related to the robustness and systematicity of NLU systems, and highlights a substantial performance gap between state-of-the-art NLU models.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

It is shown how universal sentence representations trained using the supervised data of the Stanford Natural Language Inference datasets can consistently outperform unsupervised methods like SkipThought vectors on a wide range of transfer tasks.

Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks

This work argues for the usefulness of a set of proxy tasks that evaluate reading comprehension via question answering, and classify these tasks into skill sets so that researchers can identify (and then rectify) the failings of their systems.