• Corpus ID: 236170980

Accelerating Quadratic Optimization with Reinforcement Learning

@inproceedings{Ichnowski2021AcceleratingQO,
  title={Accelerating Quadratic Optimization with Reinforcement Learning},
  author={Jeffrey Ichnowski and Paras Jain and Bartolomeo Stellato and Goran Banjac and Michael Luo and Francesco Borrelli and Joseph E. Gonzalez and Ion Stoica and Ken Goldberg},
  booktitle={NeurIPS},
  year={2021}
}
First-order methods for quadratic optimization such as OSQP are widely used for large-scale machine learning and embedded optimal control, where many related problems must be rapidly solved. These methods face two persistent challenges: manual hyperparameter tuning and convergence time to high-accuracy solutions. To address these, we explore how Reinforcement Learning (RL) can learn a policy to tune parameters to accelerate convergence. In experiments with well-known QP benchmarks we find that… 

Figures and Tables from this paper

Bridging the gap between QP-based and MPC-based RL

This paper approximate the policy and value functions using an optimization problem, taking the form of Quadratic Programs (QPs), and proposes simple tools to promote structures in the QP, pushing it to resemble a linear MPC scheme.

A reinforcement learning approach to parameter selection for distributed optimal power flow

A Reinforcement Learning Approach to Parameter Selection for Distributed Optimization in Power Systems

This work uses reinforcement learning (RL) to develop an adaptive penalty parameter selection policy for the AC optimal power flow problem solved via ADMM with the goal of minimizing the number of iterations until convergence, and shows that this policy can result in significantly accelerated convergence.

NICE: Robust Scheduling through Reinforcement Learning-Guided Integer Programming

NICE (Neural network IP Coefficient Extraction) is presented, a novel technique that combines reinforcement learning and integer programming to tackle the problem of robust scheduling and produces schedules resulting in 33% to 48% fewer disruptions than the baseline formulation.

Neural Fixed-Point Acceleration for Convex Optimization

This work brings neural acceleration into any optimization problem expressible with CVXPY, and applies the framework to SCS, the state-of-the-art solver for convex cone programming, and design models and loss functions to overcome the challenges of learning over unrolled optimization and acceleration instabilities.

Automated Dynamic Algorithm Configuration

The first comprehensive account of this new field of automated dynamic algorithm configuration (DAC) is given, a series of recent advances are presented, and a solid foundation for future research in this field is provided.

Tutorial on amortized optimization for learning to optimize over continuous domains

This tutorial discusses the key design choices behind amortized optimization, roughly categorizing models into fully-amortized and semi-Amortized approaches, and learning methods into regression-based and objectivebased approaches.

A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases

This paper uses tools from dynamical systems to investigate the inductive bias and stability properties of optimization algorithms, and applies the resulting insights to designing inductive biases for blackbox optimizers.

References

SHOWING 1-10 OF 50 REFERENCES

Continuous control with deep reinforcement learning

This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

This paper proposes soft actor-critic, an off-policy actor-Critic deep RL algorithm based on the maximum entropy reinforcement learning framework, and achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off- policy methods.

Learning to Perform Local Rewriting for Combinatorial Optimization

This paper proposes NeuRewriter, a policy to pick heuristics and rewrite the local components of the current solution to iteratively improve it until convergence, which captures the general structure of combinatorial problems and shows strong performance in three versatile tasks.

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

A new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) is developed that not only uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation.

Proximal Policy Optimization Algorithms

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective

Policy Gradient Methods for Reinforcement Learning with Function Approximation

This paper proves for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy.

Recent Advances in Hierarchical Reinforcement Learning

This work reviews several approaches to temporal abstraction and hierarchical organization that machine learning researchers have recently developed and discusses extensions of these ideas to concurrent activities, multiagent coordination, and hierarchical memory for addressing partial observability.

Recent Advances in Hierarchical Reinforcement Learning

This work reviews several approaches to temporal abstraction and hierarchical organization that machine learning researchers have recently developed and discusses extensions of these ideas to concurrent activities, multiagent coordination, and hierarchical memory for addressing partial observability.

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning

Addressing Function Approximation Error in Actor-Critic Methods

This paper builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation, and draws the connection between target networks and overestimation bias.