# Phase Transitions in Transfer Learning for High-Dimensional Perceptrons

@article{Dhifallah2021PhaseTI, title={Phase Transitions in Transfer Learning for High-Dimensional Perceptrons}, author={Oussama Dhifallah and Yue M. Lu}, journal={Entropy}, year={2021}, volume={23} }

Transfer learning seeks to improve the generalization performance of a target task by exploiting the knowledge learned from a related source task. Central questions include deciding what information one should transfer and when transfer can be beneficial. The latter question is related to the so-called negative transfer phenomenon, where the transferred source information actually reduces the generalization performance of the target task. This happens when the two tasks are sufficiently…

## 7 Citations

On the Inherent Regularization Effects of Noise Injection During Training

- Computer ScienceICML
- 2021

This paper provides a precise asymptotic characterization of the training and generalization errors of such randomly perturbed learning problems on a random feature model and shows that Gaussian noise injection in the training process is equivalent to introducing a weighted ridge regularization, when the number of noise injections tends to infinity.

Gaussian Universality of Linear Classifiers with Random Labels in High-Dimension

- Computer ScienceArXiv
- 2022

A rigorous proof that data coming from a range of generative models in high-dimensions have the same minimum training loss as Gaussian data with corresponding data covariance, and shows that this universality property is observed in practice with real datasets and random labels.

Maslow's Hammer for Catastrophic Forgetting: Node Re-Use vs Node Activation

- Computer ScienceICML
- 2022

This paper theoretically analyse both a synthetic teacher-student framework and a real data setup to provide an explanation of the presence of a trade-off between node activation and node re-use that results in worst forgetting in the intermediate regime.

Probing transfer learning with a model of synthetic correlated datasets

- Computer ScienceMachine Learning: Science and Technology
- 2022

Focusing on the problem of training two-layer networks in a binary classification setting, this work re-think a solvable model of synthetic data as a framework for modeling correlation between data-sets and shows that this model can capture a range of salient features of transfer learning with real data.

A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning

- Computer ScienceArXiv
- 2021

This paper provides a succinct overview of this emerging theory of overparameterized ML (henceforth abbreviated as TOPML) that explains these recent findings through a statistical signal processing perspective and emphasizes the unique aspects that define the TOPML research area as a subfield of modern ML theory.

Continual Learning in the Teacher-Student Setup: Impact of Task Similarity

- Computer ScienceICML
- 2021

This work extends previous analytical work on two-layer networks in the teacher-student setup to multiple teachers and finds a complex interplay between both types of similarity, initial transfer/forgetting rates, maximum transfer/ forgetting, and long-term transfer/Forgetting.

The Common Intuition to Transfer Learning Can Win or Lose: Case Studies for Linear Regression

- Computer Science
- 2021

It is demonstrated that transfer learning can beat the minimum mean square error (MMSE) solution of the independent target task and, by that, to have an improved MMSE solution.

## References

SHOWING 1-10 OF 39 REFERENCES

Double Double Descent: On Generalization Errors in Transfer Learning between Linear Regression Tasks

- Computer ScienceArXiv
- 2020

The non-asymptotic analysis shows that the generalization error of the target task follows a two-dimensional double descent trend (with respect to the number of free parameters in each of the tasks) that is controlled by the transfer learning factors.

An analytic theory of generalization dynamics and transfer learning in deep linear networks

- Computer ScienceICLR
- 2019

An analytic theory of the nonlinear dynamics of generalization in deep linear networks, both within and across tasks is developed and reveals that knowledge transfer depends sensitively, but computably, on the SNRs and input feature alignments of pairs of tasks.

Direct Transfer of Learned Information Among Neural Networks

- Computer ScienceAAAI
- 1991

By transferring weights from smaller networks trained on subtasks, this paper achieved speedups of up to an order of magnitude compared with training starting with random weights, even taking into account the time to train the smaller networks.

Solvable Model for Inheriting the Regularization through Knowledge Distillation

- Computer ScienceMSML
- 2021

A statistical physics framework is introduced that allows an analytic characterization of the properties of knowledge distillation (KD) in shallow neural networks and it is shown that, through KD, the regularization properties of the larger teacher model can be inherited by the smaller student.

A Survey on Transfer Learning

- Computer ScienceIEEE Transactions on Knowledge and Data Engineering
- 2010

The relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift are discussed.

Task Clustering and Gating for Bayesian Multitask Learning

- Computer ScienceJ. Mach. Learn. Res.
- 2003

A Bayesian approach is adopted in which some of the model parameters are shared and others more loosely connected through a joint prior distribution that can be learned from the data to combine the best parts of both the statistical multilevel approach and the neural network machinery.

Transfer of Learning

- Psychology
- 1992

Findings from various sources suggest that transfer happens by way of two rather different mechanisms, and conventional educational practices often fail to establish the conditions either for reflexive or mindful transfer.

To transfer or not to transfer

- Computer Science, PsychologyNIPS 2005
- 2005

One challenge for transfer learning research is to develop approaches that detect and avoid negative transfer using very little data from the target task.

Discriminability-Based Transfer between Neural Networks

- Computer ScienceNIPS
- 1992

A new algorithm, called Discriminability-Based Transfer (DBT), is presented, which uses an information measure to estimate the utility of hyperplanes defined by source weights in the target network, and rescales transferred weight magnitudes accordingly.

Exploiting Task Relatedness for Mulitple Task Learning

- Computer Science, PsychologyCOLT
- 2003

This work offers an alternative approach to multiple task learning, defining relatedness of tasks on the basis of similarity between the example generating distributions that underline these task.