Corpus ID: 220646430

Drinking from a Firehose: Continual Learning with Web-scale Natural Language

@article{Hu2020DrinkingFA,
  title={Drinking from a Firehose: Continual Learning with Web-scale Natural Language},
  author={Hexiang Hu and Ozan Sener and Fei Sha and Vladlen Koltun},
  journal={ArXiv},
  year={2020},
  volume={abs/2007.09335}
}
Continual learning systems will interact with humans, with each other, and with the physical world through time -- and continue to learn and adapt as they do. Such systems have typically been evaluated in artificial settings: for example, classifying randomly permuted images. A key limitation of these settings is the unnatural construct of discrete, sharply demarcated tasks that are solved in sequence. In this paper, we study a natural setting for continual learning on a massive scale. We… Expand
Continual Learning in Task-Oriented Dialogue Systems
TLDR
This paper proposes a continual learning benchmark for task-oriented dialogue systems with 37 domains to be learned continuously in four settings, such as intent recognition, state tracking, natural language generation, and end-to-end, and implements and compares multiple existing continual learning baselines. Expand
Dynamically Addressing Unseen Rumor via Continual Learning
TLDR
This work proposes an alternative solution to continuously update the model in accordance with the dynamics of rumor domain creations and adopts continual learning strategies that control the new learnings to avoid catastrophic forgetting. Expand
Online Continual Learning with Natural Distribution Shifts: An Empirical Study with Visual Data
TLDR
It is argued that “online” continual learning, where data is a single continuous stream without task boundaries, enables evaluating both information retention and online learning efficacy and introduces a new benchmark for online continual visual learning that exhibits large scale and natural distribution shifts. Expand
Reducing Representation Drift in Online Continual Learning
TLDR
This work hypothesize and empirically confirm that the selection of negatives used in the triplet loss plays a major role in the representation change, or drift, of previously observed data and can be greatly reduced by appropriate negative selection. Expand

References

SHOWING 1-10 OF 78 REFERENCES
Continual Learning with Tiny Episodic Memories
TLDR
It is observed that a very simple baseline, which jointly trains on both examples from the current task as well as examples stored in the memory, outperforms state-of-the-art CL approaches with and without episodic memory. Expand
Continual Lifelong Learning with Neural Networks: A Review
TLDR
This review critically summarize the main challenges linked to lifelong learning for artificial learning systems and compare existing neural network approaches that alleviate, to different extents, catastrophic forgetting. Expand
Language Models are Unsupervised Multitask Learners
TLDR
It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations. Expand
Efficient Lifelong Learning with A-GEM
TLDR
An improved version of GEM is proposed, dubbed Averaged GEM (A-GEM), which enjoys the same or even better performance as GEM, while being almost as computationally and memory efficient as EWC and other regularization-based methods. Expand
Gradient based sample selection for online continual learning
TLDR
This work formulation of sample selection as a constraint reduction problem based on the constrained optimization view of continual learning shows that it is equivalent to maximizing the diversity of samples in the replay buffer with parameters gradient as the feature. Expand
Progress & Compress: A scalable framework for continual learning
TLDR
The progress & compress approach is demonstrated on sequential classification of handwritten alphabets as well as two reinforcement learning domains: Atari games and 3D maze navigation. Expand
Gradient Episodic Memory for Continual Learning
TLDR
A model for continual learning, called Gradient Episodic Memory (GEM) is proposed that alleviates forgetting, while allowing beneficial transfer of knowledge to previous tasks. Expand
Multi-Task Deep Neural Networks for Natural Language Understanding
TLDR
A Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks that allows domain adaptation with substantially fewer in-domain labels than the pre-trained BERT representations. Expand
Continual Learning with Deep Generative Replay
TLDR
The Deep Generative Replay is proposed, a novel framework with a cooperative dual model architecture consisting of a deep generative model ("generator") and a task solving model ("solver"), with only these two models, training data for previous tasks can easily be sampled and interleaved with those for a new task. Expand
Multi-Task Learning for Sequence Tagging: An Empirical Study
TLDR
It is shown that in about 50% of the cases, jointly learning all 11 tasks improves upon either independent or pairwise learning of the tasks, and that pairwise MTL can inform us what tasks can benefit others orWhat tasks can be benefited if they are learned jointly. Expand
...
1
2
3
4
5
...