• Corpus ID: 235253907

Weighted Training for Cross-Task Learning

@article{Chen2021WeightedTF,
  title={Weighted Training for Cross-Task Learning},
  author={Shuxiao Chen and Koby Crammer and Han He and Dan Roth and Weijie J. Su},
  journal={ArXiv},
  year={2021},
  volume={abs/2105.14095}
}
In this paper, we introduce Target-Aware Weighted Training (TAWT), a weighted training algorithm for cross-task learning based on minimizing a representationbased task distance between the source and target tasks. We show that TAWT is easy to implement, is computationally efficient, requires little hyperparameter tuning, and enjoys non-asymptotic learning-theoretic guarantees. The effectiveness of TAWT is corroborated through extensive experiments with BERT on four sequence tagging tasks in… 

Figures and Tables from this paper

Provable and Efficient Continual Representation Learning
TLDR
This work establishes theoretical guarantees for CRL by providing sample complexity and generalization error bounds for new tasks by formalizing the statistical benefits of previously-learned representations and proposes an inference-efficient variation of PackNet called Efficient Sparse PackNet (ESPN) which employs joint channel & weight pruning.
Active Multi-Task Representation Learning
TLDR
This paper gives the first formal study on resource task sampling by leveraging the techniques from active learning and proposes an algorithm that iteratively estimates the relevance of each sourcetask to the target task and samples from each source task based on the estimated relevance.

References

SHOWING 1-10 OF 38 REFERENCES
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning
Concentration Inequalities - A Nonasymptotic Theory of Independence
TLDR
Deep connections with isoperimetric problems are revealed whilst special attention is paid to applications to the supremum of empirical processes.
On the Theory of Transfer Learning: The Importance of Task Diversity
TLDR
The results depend upon a new general notion of task diversity--applicable to models with general tasks, features, and losses--as well as a novel chain rule for Gaussian complexities.
Few-Shot Learning via Learning the Representation, Provably
TLDR
The results demonstrate representation learning can fully utilize all $n_1T$ samples from source tasks and the advantage of representation learning in both high-dimensional linear regression and neural network learning.
Mirror descent and nonlinear projected subgradient methods for convex optimization
OntoNotes: The 90% Solution
TLDR
It is described the OntoNotes methodology and its result, a large multilingual richly-annotated corpus constructed at 90% interannotator agreement, which will be made available to the community during 2007.
Introduction to the CoNLL-2000 Shared Task Chunking
We describe the CoNLL-2000 shared task: dividing text into syntactically related non-overlapping groups of words, so-called text chunking. We give background information on the data sets, present a
How Fine-Tuning Allows for Effective Meta-Learning
TLDR
This work presents a theoretical framework for analyzing a MAML-like algorithm, assuming all available tasks require approximately the same representation, and provides risk bounds on predictors found by finetuning via gradient descent, demonstrating that the method provably leverages the shared structure.
Meta-learning Transferable Representations with a Single Target Domain
TLDR
Meta Representation Learning (MeRLin) is proposed to learn transferable features and empirically outperforms previous state-of-the-art transfer learning algorithms on various real-world vision and NLP transfer learning benchmarks.
...
...