Reply to Loog et al.: Looking beyond the peaking phenomenon

@article{Belkin2020ReplyTL,
  title={Reply to Loog et al.: Looking beyond the peaking phenomenon},
  author={Mikhail Belkin and Daniel J. Hsu and Siyuan Ma and Soumik Mandal},
  journal={Proceedings of the National Academy of Sciences},
  year={2020},
  volume={117},
  pages={10627 - 10627}
}
The letter “A brief prehistory of double descent” (1) written in response to our article “Reconciling modern machine-learning practice and the classical bias–variance trade-off” (2) brings a number of interesting points and important references. We agree that the … [↵][1]1To whom correspondence may be addressed. Email: mbelkin{at}cse.ohio-state.edu. [1]: #xref-corresp-1-1 
What causes the test error? Going beyond bias-variance via ANOVA
TLDR
Using the analysis of variance (ANOVA) to decompose the variance in the test error in a symmetric way, for studying the generalization performance of certain two-layer linear and non-linear networks and advanced deterministic equivalent techniques for Haar random matrices are proposed.
When and how epochwise double descent happens
TLDR
This work develops an analytically tractable model of epochwise double descent that allows us to characterise theoretically when this effect is likely to occur and shows experimentally that deep neural networks behave similarly to the theoretical model.
Fluctuations, Bias, Variance & Ensemble of Learners: Exact Asymptotics for Convex Losses in High-Dimension
TLDR
This manuscript develops a quantitative and rigorous theory for the study of uctuations in an ensemble of generalised linear models trained on di erent, but correlated, features in high-dimensions, and provides a complete description of the asymptotic joint distribution of the empirical risk minimiser for generic convex loss and regularisation in the high-dimensional limit.

References

SHOWING 1-2 OF 2 REFERENCES
A brief prehistory of double descent
TLDR
The shape of risk curves in the context of modern high-complexity learners can display, what they call, double descent: the risk initially decreases, attains a minimum, and then increases until $N$ equals $n$, where the training data is fitted perfectly.
Reconciling modern machine-learning practice and the classical bias–variance trade-off
TLDR
This work shows how classical theory and modern practice can be reconciled within a single unified performance curve and proposes a mechanism underlying its emergence, and provides evidence for the existence and ubiquity of double descent for a wide spectrum of models and datasets.