• Publications
  • Influence
Reconciling modern machine-learning practice and the classical bias–variance trade-off
TLDR
This work shows how classical theory and modern practice can be reconciled within a single unified performance curve and proposes a mechanism underlying its emergence, and provides evidence for the existence and ubiquity of double descent for a wide spectrum of models and datasets.
The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning
TLDR
The key observation is that most modern learning architectures are over-parametrized and are trained to interpolate the data by driving the empirical loss close to zero, so it is still unclear why these interpolated solutions perform well on test data.
To understand deep learning we need to understand kernel learning
TLDR
It is argued that progress on understanding deep learning will be difficult until more tractable "shallow" kernel methods are better understood, and a need for new theoretical ideas for understanding properties of classical kernel methods.
Reconciling modern machine learning and the bias-variance trade-off
TLDR
A new "double descent" risk curve is exhibited that extends the traditional U-shaped bias-variance curve beyond the point of interpolation and shows that the risk of suitably chosen interpolating predictors from these models can, in fact, be decreasing as the model complexity increases, often below the risk achieved using non-interpolating models.
On exponential convergence of SGD in non-convex over-parametrized learning
TLDR
It is argued that the PL condition provides a relevant and attractive setting for many machine learning problems, particularly in the over-parametrized regime.
Reconciling modern machine learning practice and the bias-variance trade-off
TLDR
This paper reconciles the classical understanding and the modern practice within a unified performance curve that subsumes the textbook U-shaped bias-variance trade-off curve by showing how increasing model capacity beyond the point of interpolation results in improved performance.
S-CAVE: Effective SSD caching to improve virtual machine storage performance
TLDR
The design and implementation of S-CAVE, a hypervisor-based SSD caching facility, which effectively manages a storage cache in a Multi-VM environment by collecting and exploiting runtime information from both VMs and storage devices is presented.
Concurrent Analytical Query Processing with GPUs
TLDR
Concurrent query execution as an effective solution to efficiently share GPUs among concurrent queries for high throughput and relies on GPU query scheduling and device memory swapping policies to address this challenge.
Diving into the shallows: a computational perspective on large-scale shallow learning
TLDR
EigenPro iteration is introduced, based on a preconditioning scheme using a small number of approximately computed eigenvectors, which turns out that injecting this small (computationally inexpensive and SGD-compatible) amount of approximate second-order information leads to major improvements in convergence.
Kernel Machines That Adapt To Gpus For Effective Large Batch Training
TLDR
This paper develops the first analytical framework that extends linear scaling to match the parallel computing capacity of a resource, designed for a class of classical kernel machines.
...
1
2
...