• Corpus ID: 59523704

Learning and Evaluating General Linguistic Intelligence

@article{Yogatama2019LearningAE,
  title={Learning and Evaluating General Linguistic Intelligence},
  author={Dani Yogatama and Cyprien de Masson d'Autume and Jerome T. Connor and Tom{\'a}s Kocisk{\'y} and Mike Chrzanowski and Lingpeng Kong and Angeliki Lazaridou and Wang Ling and Lei Yu and Chris Dyer and Phil Blunsom},
  journal={ArXiv},
  year={2019},
  volume={abs/1901.11373}
}
We define general linguistic intelligence as the ability to reuse previously acquired knowledge about a language's lexicon, syntax, semantics, and pragmatic conventions to adapt to new tasks quickly. Using this definition, we analyze state-of-the-art natural language understanding models and conduct an extensive empirical investigation to evaluate them against these criteria through a series of experiments that assess the task-independence of the knowledge being acquired by the learning process… 

Figures and Tables from this paper

How Can We Accelerate Progress Towards Human-like Linguistic Generalization?
TLDR
This position paper describes and critiques the Pretraining-Agnostic Identically Distributed (PAID) evaluation paradigm, and advocates for supplementing or replacing PAID with paradigms that reward architectures that generalize as quickly and robustly as humans.
Language Models are Unsupervised Multitask Learners
TLDR
It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.
Syntactic Data Augmentation Increases Robustness to Inference Heuristics
TLDR
The best-performing augmentation method, subject/object inversion, improved BERT’s accuracy on controlled examples that diagnose sensitivity to word order from 0.28 to 0.73, suggesting that augmentation causes BERT to recruit abstract syntactic representations.
Do Language Models Learn Commonsense Knowledge?
Language models (LMs) trained on large amounts of data (e.g., Brown et al., 2020; Patwary et al., 2021) have shown impressive performance on many NLP tasks under the zero-shot and few-shot setup.
Continual Learning for Natural Language Generation in Task-oriented Dialog Systems
TLDR
This work proposes a method called ARPER (Adaptively Regularized Prioritized Prioritized Exemplar Replay) by replaying prioritized historical exemplars, together with an adaptive regularization technique based on Elastic Weight Consolidation.
Efficient Meta Lifelong-Learning with Limited Memory
TLDR
This paper identifies three common principles of lifelong learning methods and proposes an efficient meta-lifelong framework that combines them in a synergistic fashion and alleviates both catastrophic forgetting and negative transfer at the same time.
Syntactic Structure Distillation Pretraining for Bidirectional Encoders
TLDR
A knowledge distillation strategy for injecting syntactic biases into BERT pretraining, by distilling the syntactically informative predictions of a hierarchical—albeit harder to scale—syntactic language model.
What Happens To BERT Embeddings During Fine-tuning?
TLDR
It is found that fine-tuning is a conservative process that primarily affects the top layers of BERT, albeit with noteworthy variation across tasks, whereas SQuAD and MNLI involve much shallower processing.
Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning
TLDR
It is found that catastrophic forgetting affects generalization ability to a lesser degree than performance on seen tasks; while continual learning algorithms can still bring considerable benefit to the generalizationAbility.
Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning Skills
TLDR
This work proposes to leverage semi-structured tables, and automatically generate at scale questionparagraph pairs, where answering the question requires reasoning over multiple facts in the paragraph, and adds a pre-training step over this synthetic data, which includes examples that require 16 different reasoning skills.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 33 REFERENCES
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
TLDR
A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks.
Improving Language Understanding by Generative Pre-Training
TLDR
The general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, improving upon the state of the art in 9 out of the 12 tasks studied.
A large annotated corpus for learning natural language inference
TLDR
The Stanford Natural Language Inference corpus is introduced, a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning, which allows a neural network-based model to perform competitively on natural language inference benchmarks for the first time.
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
TLDR
The Multi-Genre Natural Language Inference corpus is introduced, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding and shows that it represents a substantially more difficult task than does the Stanford NLI corpus.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Zero-Shot Relation Extraction via Reading Comprehension
TLDR
It is shown that relation extraction can be reduced to answering simple reading comprehension questions, by associating one or more natural-language questions with each relation slot, and that zero-shot generalization to unseen relation types is possible, at lower accuracy levels.
Overcoming catastrophic forgetting in neural networks
TLDR
It is shown that it is possible to overcome the limitation of connectionist models and train networks that can maintain expertise on tasks that they have not experienced for a long time and selectively slowing down learning on the weights important for previous tasks.
SQuAD: 100,000+ Questions for Machine Comprehension of Text
TLDR
A strong logistic regression model is built, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%).
Attention is All you Need
TLDR
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
TLDR
It is shown that, in comparison to other recently introduced large-scale datasets, TriviaQA has relatively complex, compositional questions, has considerable syntactic and lexical variability between questions and corresponding answer-evidence sentences, and requires more cross sentence reasoning to find answers.
...
1
2
3
4
...