• Corpus ID: 220646430

Drinking from a Firehose: Continual Learning with Web-scale Natural Language

  title={Drinking from a Firehose: Continual Learning with Web-scale Natural Language},
  author={Hexiang Hu and Ozan Sener and Fei Sha and Vladlen Koltun},
Continual learning systems will interact with humans, with each other, and with the physical world through time -- and continue to learn and adapt as they do. Such systems have typically been evaluated in artificial settings: for example, classifying randomly permuted images. A key limitation of these settings is the unnatural construct of discrete, sharply demarcated tasks that are solved in sequence. In this paper, we study a natural setting for continual learning on a massive scale. We… 
The CLEAR Benchmark: Continual LEArning on Real-World Imagery
This paper introduces CLEAR, the first continual image classification benchmark dataset with a natural temporal evolution of visual concepts in the real world that spans a decade (2004-2014), and proposes a novel "streaming" protocol for CL that always test on the (near) future.
Continual Learning in Task-Oriented Dialogue Systems
This paper proposes a continual learning benchmark for task-oriented dialogue systems with 37 domains to be learned continuously in four settings, such as intent recognition, state tracking, natural language generation, and end-to-end, and implements and compares multiple existing continual learning baselines.
Dynamically Addressing Unseen Rumor via Continual Learning
This work proposes an alternative solution to continuously update the model in accordance with the dynamics of rumor domain creations and adopts continual learning strategies that control the new learnings to avoid catastrophic forgetting.
Online Continual Learning with Natural Distribution Shifts: An Empirical Study with Visual Data
It is argued that “online” continual learning, where data is a single continuous stream without task boundaries, enables evaluating both information retention and online learning efficacy and introduces a new benchmark for online continual visual learning that exhibits large scale and natural distribution shifts.
Reducing Representation Drift in Online Continual Learning
This work hypothesize and empirically confirm that the selection of negatives used in the triplet loss plays a major role in the representation change, or drift, of previously observed data and can be greatly reduced by appropriate negative selection.


Continual Learning with Tiny Episodic Memories
It is observed that a very simple baseline, which jointly trains on both examples from the current task as well as examples stored in the memory, outperforms state-of-the-art CL approaches with and without episodic memory.
Continual Lifelong Learning with Neural Networks: A Review
Language Models are Unsupervised Multitask Learners
It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.
Efficient Lifelong Learning with A-GEM
An improved version of GEM is proposed, dubbed Averaged GEM (A-GEM), which enjoys the same or even better performance as GEM, while being almost as computationally and memory efficient as EWC and other regularization-based methods.
Gradient based sample selection for online continual learning
This work formulation of sample selection as a constraint reduction problem based on the constrained optimization view of continual learning shows that it is equivalent to maximizing the diversity of samples in the replay buffer with parameters gradient as the feature.
Progress & Compress: A scalable framework for continual learning
The progress & compress approach is demonstrated on sequential classification of handwritten alphabets as well as two reinforcement learning domains: Atari games and 3D maze navigation.
Gradient Episodic Memory for Continual Learning
A model for continual learning, called Gradient Episodic Memory (GEM) is proposed that alleviates forgetting, while allowing beneficial transfer of knowledge to previous tasks.
Multi-Task Deep Neural Networks for Natural Language Understanding
A Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks that allows domain adaptation with substantially fewer in-domain labels than the pre-trained BERT representations.
Continual Learning with Deep Generative Replay
The Deep Generative Replay is proposed, a novel framework with a cooperative dual model architecture consisting of a deep generative model ("generator") and a task solving model ("solver"), with only these two models, training data for previous tasks can easily be sampled and interleaved with those for a new task.
Multi-Task Learning for Sequence Tagging: An Empirical Study
It is shown that in about 50% of the cases, jointly learning all 11 tasks improves upon either independent or pairwise learning of the tasks, and that pairwise MTL can inform us what tasks can benefit others orWhat tasks can be benefited if they are learned jointly.