Share This Author
Petuum: A New Platform for Distributed Machine Learning on Big Data
This work proposes a general-purpose framework, Petuum, that systematically addresses data- and model-parallel challenges in large-scale ML, by observing that many ML programs are fundamentally optimization-centric and admit error-tolerant, iterative-convergent algorithmic solutions.
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference
This work proposes a simple but effective method, DeeBERT, to accelerate BERT inference, which allows samples to exit earlier without passing through the entire model, and provides new ideas to efficiently apply deep transformer-based models to downstream tasks.
Additive Approximations in High Dimensional Nonparametric Regression via the SALSA
This work proposes SALSA, which bridges this gap by allowing interactions between variables, but controls model capacity by limiting the order of interactions, and shows that the method is competitive against other alternatives.
Sum-of-Squares Polynomial Flow
This work proposes a general framework for high-dimensional density estimation, by specifying one-dimensional transformations (equivalently conditional densities) and appropriate conditioner networks and motivates a new Sum-of-Squares (SOS) flow that is interpretable, universal, and easy to train.
Accelerated Training for Matrix-norm Regularization: A Boosting Approach
A boosting method for regularized learning that guarantees e accuracy within O(1 /e) iterations is proposed and an application to latent multiview learning is demonstrated for which it provides the first efficient weak-oracle.
On Decomposing the Proximal Map
- Yaoliang Yu
- Mathematics, Computer ScienceNIPS
- 5 December 2013
This paper initiates a systematic investigation of when the proximal map of a sum of functions decomposes into the composition of the proxies of the individual summands, and unifies a few known results scattered in the literature and discovers several new decompositions obtained almost effortlessly from the theory.
Convex Multi-view Subspace Learning
This paper develops an efficient algorithm that recovers an optimal data reconstruction by exploiting an implicit convex regularizer, then recovers the corresponding latent representation and reconstruction model, jointly and optimally.
Better Approximation and Faster Algorithm Using the Proximal Average
- Yaoliang Yu
- Computer ScienceNIPS
- 5 December 2013
A nonsmooth approximation is pointed out which simply pretends the linearity of the proximal map, and yields a novel proximal gradient algorithm that is strictly better than the one based on smoothing, without incurring any extra overhead.
Analysis of Kernel Mean Matching under Covariate Shift
By comparing KMM with the natural plug-in estimator, the superiority of the former is established and this work provides concrete evidence/ understanding to the effectiveness of KMM under covariate shift.
Problems and Opportunities in Training Deep Learning Software Systems: An Analysis of Variance
- H. Pham, Shangshu Qian, Nachiappan Nagappan
- Computer Science35th IEEE/ACM International Conference on…
- 1 September 2020
This work is the first to study the variance of DL systems and the awareness of this variance among researchers and practitioners and directs SE researchers to challenging tasks such as creating deterministic DL implementations to facilitate debugging and improving the reproducibility of DL software and results.