Learn2Hop: Learned Optimization on Rough Landscapes
@article{Merchant2021Learn2HopLO, title={Learn2Hop: Learned Optimization on Rough Landscapes}, author={Amil Merchant and Luke Metz and Samuel S. Schoenholz and Ekin Dogus Cubuk}, journal={ArXiv}, year={2021}, volume={abs/2107.09661} }
Optimization of non-convex loss surfaces containing many local minima remains a critical problem in a variety of domains, including operations research, informatics, and material design. Yet, current techniques either require extremely high iteration counts or a large number of random restarts for good performance. In this work, we propose adapting recent developments in meta-learning to these many-minima problems by learning the optimization algorithm for various loss landscapes. We focus on…
Figures and Tables from this paper
9 Citations
StriderNET: A Graph Reinforcement Learning Approach to Optimize Atomic Structures on Rough Energy Landscapes
- Computer ScienceArXiv
- 2023
S TRIDER NET presents a promising framework that enables the optimization of atomic structures on a rough landscape and outperforms the classical optimization algorithms such as gradient descent, FIRE, and Adam.
Discovering Evolution Strategies via Meta-Black-Box Optimization
- Computer ScienceArXiv
- 2022
This work proposes to discover effective update rules for evolution strategies via meta-learning, and employs a search strategy parametrized by a self-attention-based architecture, which guarantees the update rule is invariant to the ordering of the candidate solutions.
Transformer-Based Learned Optimization
- Computer ScienceArXiv
- 2022
The main innovation is to propose a new neural network architecture for the learned optimizer inspired by the classic BFGS algorithm that allows for conditioning across different dimensions of the parameter space of the target problem while remaining applicable to optimization tasks of variable dimensionality without re-training.
VeLO: Training Versatile Learned Optimizers by Scaling Up
- Computer ScienceArXiv
- 2022
An optimizer for deep learning is trained which is itself a small neural network that ingests gradients and outputs parameter updates, and requires no hyperparameter tuning, instead automatically adapting to the specifics of the problem being optimized.
Tutorial on amortized optimization for learning to optimize over continuous domains
- Computer ScienceArXiv
- 2022
This tutorial discusses the key design choices behind amortized optimization, roughly categorizing models into fully-amortized and semi-Amortized approaches, and learning methods into regression-based and objectivebased approaches.
Learning to Generalize Provably in Learning to Optimize
- Computer ScienceArXiv
- 2023
This paper theoretically establish an implicit connection between the local entropy and the Hessian and hence unify their roles in the handcrafted design of generalizable optimizers as equivalent metrics of the landscape flatness of loss functions and proposes to incorporate these two metrics as flatness-aware regularizers into the L2O framework in order to meta-train optimizers to learn to generalize.
Learning to Optimize: A Primer and A Benchmark
- Computer ScienceArXiv
- 2021
This article is poised to be the first comprehensive survey and benchmark of L2O for continuous optimization, set up taxonomies, categorize existing works and research directions, present insights, and identify open challenges.
evosax: JAX-based Evolution Strategies
- Computer ScienceArXiv
- 2022
The deep learning revolution has greatly been accelerated by the ’hardware lottery’: Recent advances in modern hardware accelerators and compilers paved the way for large-scale batch gradient…
Distributional Reinforcement Learning for Scheduling of Chemical Production Processes
- BusinessArXiv
- 2022
Reinforcement Learning (RL) has recently received significant attention from the process systems engineering and control communities. Recent works have investigated the application of RL to identify…
References
SHOWING 1-10 OF 62 REFERENCES
Reverse engineering learned optimizers reveals known and novel mechanisms
- Computer ScienceNeurIPS
- 2021
This work studies learned optimizers trained from scratch on three disparate tasks, and discovers that they have learned interpretable mechanisms, including: momentum, gradient clipping, learning rate schedules, and a new form of learning rate adaptation.
Learned Optimizers that Scale and Generalize
- Computer ScienceICML
- 2017
This work introduces a learned gradient descent optimizer that generalizes well to new tasks, and which has significantly reduced memory and computation overhead, by introducing a novel hierarchical RNN architecture with minimal per-parameter overhead.
Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves
- Computer ScienceArXiv
- 2020
This work introduces a new, neural network parameterized, hierarchical optimizer with access to additional features such as validation loss to enable automatic regularization and shows evidence of being useful for out of distribution tasks such as training themselves from scratch.
Evolution of the Potential Energy Surface with Size for Lennard-Jones Clusters
- Physics
- 1999
Disconnectivity graphs are used to characterize the potential energy surfaces of Lennard-Jones clusters containing 13, 19, 31, 38, 55, and 75 atoms. This set includes members which exhibit either one…
Learning Gradient Descent: Better Generalization and Longer Horizons
- Computer ScienceICML
- 2017
This paper proposes a new learning-to-learn model and some useful and practical tricks, and demonstrates the effectiveness of the algorithms on a number of tasks, including deep MLPs, CNNs, and simple LSTMs.
Adam: A Method for Stochastic Optimization
- Computer ScienceICLR
- 2015
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Understanding and correcting pathologies in the training of learned optimizers
- Computer ScienceICML
- 2019
This work proposes a training scheme which overcomes both of these difficulties, by dynamically weighting two unbiased gradient estimators for a variational loss on optimizer performance, allowing us to train neural networks to perform optimization of a specific task faster than tuned first-order methods.
Neural Message Passing for Quantum Chemistry
- Computer ScienceICML
- 2017
Using MPNNs, state of the art results on an important molecular property prediction benchmark are demonstrated and it is believed future work should focus on datasets with larger molecules or more accurate ground truth labels.
Learned optimizers that outperform SGD on wall-clock and test loss
- Computer ScienceArXiv
- 2018
This work proposes a training scheme which overcomes both of these difficulties, by dynamically weighting two unbiased gradient estimators for a variational loss on optimizer performance, and is able to learn optimizers that train networks to better generalization than first order methods.
The Loss Surfaces of Multilayer Networks
- Computer ScienceAISTATS
- 2015
It is proved that recovering the global minimum becomes harder as the network size increases and that it is in practice irrelevant as global minimum often leads to overfitting.