AUTOMATA: Gradient Based Data Subset Selection for Compute-Efficient Hyper-parameter Tuning

@article{Killamsetty2022AUTOMATAGB,
  title={AUTOMATA: Gradient Based Data Subset Selection for Compute-Efficient Hyper-parameter Tuning},
  author={Krishnateja Killamsetty and Guttu Sai Abhishek and Aakriti and Alexandre V. Evfimievski and Lucian Popa and Ganesh Ramakrishnan and Rishabh K. Iyer},
  journal={ArXiv},
  year={2022},
  volume={abs/2203.08212}
}
Deep neural networks have seen great success in recent years; however, training a deep model is often challenging as its performance heavily depends on the hyper-parameters used. In addition, finding the optimal hyper-parameter configuration, even with state-of-the-art (SOTA) hyper-parameter optimization (HPO) algorithms, can be time-consuming, requiring multiple training runs over the entire dataset for different possible sets of hyper-parameters. Our central insight is that using an… 
1 Citations

Figures and Tables from this paper

O RIENT : Submodular Mutual Information Measures for Data Subset Selection under Distribution Shift

TLDR
This work proposes O RIENT, a subset selection framework that uses the submodular mutual information (SMI) functions to select a source data subset similar to the target data for faster training.

References

SHOWING 1-10 OF 54 REFERENCES

Reading Digits in Natural Images with Unsupervised Feature Learning

TLDR
A new benchmark dataset for research use is introduced containing over 600,000 labeled digits cropped from Street View images, and variants of two recently proposed unsupervised feature learning methods are employed, finding that they are convincingly superior on benchmarks.

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

TLDR
A Sentiment Treebank that includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality, and introduces the Recursive Neural Tensor Network.

Coresets for Data-efficient Training of Machine Learning Models

TLDR
CRAIG is developed, a method to select a weighted subset of training data that closely estimates the full gradient by maximizing a submodular function and it is proved that applying IG to this subset is guaranteed to converge to the (near)optimal solution with the same convergence rate as that of IG for convex optimization.

A High-Throughput Screening Approach to Discovering Good Forms of Biologically Inspired Visual Representation

TLDR
This approach can yield significant, reproducible gains in performance across an array of basic object recognition tasks, consistently outperforming a variety of state-of-the-art purpose-built vision systems from the literature.

Deep Residual Learning for Image Recognition

TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

Toward Semantics-Based Answer Pinpointing

We describe the treatment of questions (Question-Answer Typology, question parsing, and results) in the Weblcopedia question answering system.

Learning Question Classifiers

TLDR
A hierarchical classifier is learned that is guided by a layered semantic hierarchy of answer types, and eventually classifies questions into fine-grained classes.

GRAD-MATCH: Gradient Matching based Data Subset Selection for Efficient Deep Model Training

TLDR
A general framework, GRAD-MATCH, which finds subsets that closely match the gradient of the training or validation set using an orthogonal matching pursuit algorithm and achieves the best accuracy-efficiency trade-off.

A System for Massively Parallel Hyperparameter Tuning

TLDR
This work introduces a simple and robust hyperparameter optimization algorithm called ASHA, which exploits parallelism and aggressive early-stopping to tackle large-scale hyperparameters optimization problems, and shows that ASHA outperforms existing state-of-the-art hyper parameter optimization methods.

Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization

TLDR
A novel algorithm is introduced, Hyperband, for hyperparameter optimization as a pure-exploration non-stochastic infinite-armed bandit problem where a predefined resource like iterations, data samples, or features is allocated to randomly sampled configurations.
...