• Corpus ID: 216245794

A System for Massively Parallel Hyperparameter Tuning

@article{Li2020ASF,
  title={A System for Massively Parallel Hyperparameter Tuning},
  author={Liam Li and Kevin G. Jamieson and Afshin Rostamizadeh and Ekaterina Gonina and Jonathan Ben-tzur and Moritz Hardt and Benjamin Recht and Ameet S. Talwalkar},
  journal={arXiv: Learning},
  year={2020}
}
Modern learning models are characterized by large hyperparameter spaces and long training times. These properties, coupled with the rise of parallel computing and the growing demand to productionize machine learning workloads, motivate the need to develop mature hyperparameter optimization functionality in distributed computing settings. We address this challenge by first introducing a simple and robust hyperparameter optimization algorithm called ASHA, which exploits parallelism and aggressive… 

Figures and Tables from this paper

Massively Parallel Hyperparameter Tuning
TLDR
This work introduces the large-scale regime for parallel hyperparameter tuning, where one needs to evaluate orders of magnitude more configurations than available parallel workers in a small multiple of the wall-clock time needed to train a single model.
Towards an Optimized GROUP BY Abstraction for Large-Scale Machine Learning
TLDR
An extensive empirical evaluation on large ML benchmark datasets shows that Kingpin matches or is 4x to 14x faster than state-of-the-art ML systems, including Ray's native execution and PyTorch DDP.
AUTOMATA: Gradient Based Data Subset Selection for Compute-Efficient Hyper-parameter Tuning
TLDR
This work proposes AUTOMATA, a gradient-based subset selection framework for hyper-parameter tuning that achieves significantly faster turnaround times and speedups of 3×-30× while achieving comparable performance to thehyper-parameters found using the entire dataset.
Hyper-Tune: Towards Efficient Hyper-parameter Tuning at Scale
TLDR
Inspired by the experience when deploying hyper-parameter tuning in a real-world application in production and the limitations of existing systems, Hyper-Tune is proposed, an efficient and robust distributed hyper- parameter tuning framework that outperforms competitive hyper- Parameter tuning systems on a wide range of scenarios.
Optimizing Large-Scale Machine Learning over Groups
TLDR
This work puts forth a novel hybrid approach to grouped learning that avoids redundancy in communications and I/O using a novel form of parallel gradient descent the authors call Gradient Accumulation Parallelism (GAP), and prototype the ideas into a system built on top of existing ML tools and the flexible massively-parallel runtime Ray.
Scalable One-Pass Optimisation of High-Dimensional Weight-Update Hyperparameters by Implicit Differentiation
TLDR
This work extends existing methods to develop an approximate hypergradient-based hyperparameter optimiser which is applicable to any continuoushyperparameter appearing in a differentiable model weight update, yet requires only one training episode, with no restarts.
Understanding and optimizing packed neural network training for hyper-parameter tuning
TLDR
This paper proposes a primitive for jointly training multiple neural network models on a single GPU, called pack, and presents a comprehensive empirical study of pack and end-to-end experiments that suggest significant improvements for hyperparameter tuning.
Hippo: Sharing Computations in Hyper-Parameter Optimization
Hyper-parameter optimization is crucial for pushing the accuracy of a deep learning model to its limits. However, a hyper-parameter optimization job, referred to as a study, involves numerous trials
Model-Parallel Task Parallelism for Efficient Multi-Large-Model Deep Learning
TLDR
H YDRA decouples scalability of model parameters from parallelism of execution, thus enabling DL users to train even a 6-billion parameter model on a single commodity GPU and fully exploits the higher speedup potential offered by task parallelism in a multi-GPU setup, yielding near-linear strong scaling and in turn, making rigorous model selection perhaps more practical for such models.
...
...

References

SHOWING 1-10 OF 48 REFERENCES
Tune: A Research Platform for Distributed Model Selection and Training
TLDR
Tune is proposed, a unified framework for model selection and training that provides a narrow-waist interface between training scripts and search algorithms that meets the requirements for a broad range of hyperparameter search algorithms, allows straightforward scaling of search to large clusters, and simplifies algorithm implementation.
Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves
TLDR
This paper mimics the early termination of bad runs using a probabilistic model that extrapolates the performance from the first part of a learning curve, enabling state-of-the-art hyperparameter optimization methods for DNNs to find DNN settings that yield better performance than those chosen by human experts.
Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets
TLDR
A generative model for the validation error as a function of training set size is proposed, which learns during the optimization process and allows exploration of preliminary configurations on small subsets, by extrapolating to the full dataset.
Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization
TLDR
A novel algorithm is introduced, Hyperband, for hyperparameter optimization as a pure-exploration non-stochastic infinite-armed bandit problem where a predefined resource like iterations, data samples, or features is allocated to randomly sampled configurations.
Practical Bayesian Optimization of Machine Learning Algorithms
TLDR
This work describes new algorithms that take into account the variable cost of learning algorithm experiments and that can leverage the presence of multiple cores for parallel experimentation and shows that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms.
Multi-Task Bayesian Optimization
TLDR
This paper proposes an adaptation of a recently developed acquisition function, entropy search, to the cost-sensitive, multi-task setting and demonstrates the utility of this new acquisition function by leveraging a small dataset to explore hyper-parameter settings for a large dataset.
BOHB: Robust and Efficient Hyperparameter Optimization at Scale
TLDR
This work proposes a new practical state-of-the-art hyperparameter optimization method, which consistently outperforms both Bayesian optimization and Hyperband on a wide range of problem types, including high-dimensional toy functions, support vector machines, feed-forward neural networks, Bayesian Neural networks, deep reinforcement learning, and convolutional neural networks.
Gandiva: Introspective Cluster Scheduling for Deep Learning
TLDR
Gandiva is introduced, a new cluster scheduling framework that utilizes domain-specific knowledge to improve latency and efficiency of training deep learning models in a GPU cluster and achieves better utilization by transparently migrating and time-slicing jobs to achieve better job-to-resource fit.
Algorithms for Hyper-Parameter Optimization
TLDR
This work contributes novel techniques for making response surface models P(y|x) in which many elements of hyper-parameter assignment (x) are known to be irrelevant given particular values of other elements.
...
...