Enforcing Policy Feasibility Constraints through Differentiable Projection for Energy Optimization

  title={Enforcing Policy Feasibility Constraints through Differentiable Projection for Energy Optimization},
  author={Bingqing Chen and Priya L. Donti and Kyri Baker and J. Zico Kolter and Mario Berg{\'e}s},
  journal={Proceedings of the Twelfth ACM International Conference on Future Energy Systems},
While reinforcement learning (RL) is gaining popularity in energy systems control, its real-world applications are limited due to the fact that the actions from learned policies may not satisfy functional requirements or be feasible for the underlying physical system. In this work, we propose PROjected Feasibility (PROF), a method to enforce convex operational constraints within neural policies. Specifically, we incorporate a differentiable projection layer within a neural network-based policy… 

Figures and Tables from this paper

Near-optimal Deep Reinforcement Learning Policies from Data for Zone Temperature Control
Physically Consistent Neural Networks (PCNNs) are used as simulation environments, for which optimal control inputs are easy to compute, and results hint that DRL agents not only clearly outperform conventional rule-based controllers, they furthermore attain near-optimal performance.
Safe Pontryagin Differentiable Programming
This work proves three fundamentals of the proposed Safe PDP: both the solution and its gradient in the backward pass can be approximated by solving their more efficient unconstrained counterparts, and all intermediate results throughout the approximation and optimization strictly respect the constraints, thus guaranteeing safety throughout the entire learning and control process.
Diversity for transfer in learning-based control of buildings
The application of reinforcement learning to the optimal control of building systems has gained traction in recent years as it can cut the building energy consumption and improve human comfort.…
Emissions-aware electricity network expansion planning via implicit differentiation
This work solves a variant of the classical problem of designing or expanding an electricity network by using gradient descent with implicit differentiation to minimize some mixture of cost and greenhouse gas emissions, even if the underlying dispatch model does not tax emissions.
Capturing Electricity Market Dynamics in the Optimal Trading of Strategic Agents using Neural Network Constrained Optimization
In competitive electricity markets the optimal trading problem of an electricity market agent is commonly formulated as a bi-level program, and solved as mathematical program with equilibrium…
Likelihood Contribution based Multi-scale Architecture for Generative Flows
A novel multi-scale architecture that performs data dependent factorization to decide which dimensions should pass through more flow layers is proposed and a heuristic based on the contribution of each dimension to the total log-likelihood which encodes the importance of the dimensions is introduced.
Conditional Synthetic Data Generation for Personal Thermal Comfort Models
It is proposed to implement a state-of-the-art conditional synthetic data generator to generate synthetic data corresponding to the low-frequency classes, and it is shown that the synthetic data generated has a distribution that mimics the real data distribution.
CDCGen: Cross-Domain Conditional Generation via Normalizing Flows and Adversarial Training
A transfer learningbased framework utilizing normalizing flows, coupled with both maximum-likelihood and adversarial training, that can generate synthetic samples conditioned on them in the target domain by generating non-trivial augmentations via attribute and component transformations.
Conditional Synthetic Data Generation for Robust Machine Learning Applications with Limited Pandemic Data
This work presents a hybrid model consisting of a conditional generative flow and a classifier for conditional synthetic data generation, and shows that the method significantly outperforms existing models both on qualitative and quantitative performance.
Adversarial poisoning attacks on reinforcement learning-driven energy pricing
Complex controls are increasingly common in power systems. Reinforcement learning (RL) has emerged as a strong candidate for implementing various controllers. One common use of RL in this context is…


Constrained Policy Optimization
Constrained Policy Optimization (CPO) is proposed, the first general-purpose policy search algorithm for constrained reinforcement learning with guarantees for near-constraint satisfaction at each iteration, and allows us to train neural network policies for high-dimensional control while making guarantees about policy behavior all throughout training.
Projection-Based Constrained Policy Optimization
This paper proposes a new algorithm - Projection Based ConstrainedPolicy Optimization (PCPO), an iterative method for optimizing policies in a two-step process - the first step performs an unconstrained update while the second step reconciles the constraint violation by projection the policy back onto the constraint set.
Towards Off-policy Evaluation as a Prerequisite for Real-world Reinforcement Learning in Building Control
An initial study of off-policy evaluation (OPE), a problem prerequisite to real-world reinforcement learning (RL), in the context of building control, adopts the approximate model (AM) method and uses bootstrapping to quantify uncertainty and correct for bias.
Policy Optimization for H2 Linear Control with H∞ Robustness Guarantee: Implicit Regularization and Global Convergence
This work offers the first results on the implicit regularization property and global convergence of PO methods for robust/risk-sensitive control for H2 linear control with H∞-norm robustness guarantee, for both discrete and continuous-time settings.
Gnu-RL: A Precocial Reinforcement Learning Solution for Building HVAC Control Using a Differentiable MPC Policy
Gnu-RL is proposed: a novel approach that enables practical deployment of RL for HVAC control and requires no prior information other than historical data from existing HVac controllers, and adopts a recently-developed Differentiable Model Predictive Control (MPC) policy, which encodes domain knowledge on planning and system dynamics.
Safe Model-based Reinforcement Learning with Stability Guarantees
This paper presents a learning algorithm that explicitly considers safety, defined in terms of stability guarantees, and extends control-theoretic results on Lyapunov stability verification and shows how to use statistical models of the dynamics to obtain high-performance control policies with provable stability certificates.
Deep Learning for Reactive Power Control of Smart Inverters under Communication Constraints
  • Sarthak Gupta, V. Kekatos, Ming Jin
  • Engineering
    2020 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm)
  • 2020
This work advocates deciding inverter injection setpoints using deep neural networks (DNNs) and uniquely addresses the small-big data conundrum where utilities collect detailed smart meter readings yet on an hourly basis, while in real time inverters should be driven by local inputs and minimal utility coordination to save on communication.
Enforcing robust control guarantees within neural network policies
This work shows that by integrating custom convex-optimization-based projection layers into a nonlinear policy, it can construct a provably robust neural network policy class that outperforms robust control methods in the average (non-adversarial) setting.
Predicting AC Optimal Power Flows: Combining Deep Learning and Lagrangian Dual Methods
A deep learning approach to the Optimal Power Flow problem that exploits the information available in the prior states of the system, as well as a dual Lagrangian method to satisfy the physical and engineering constraints present in the OPF.
High-Fidelity Machine Learning Approximations of Large-Scale Optimal Power Flow
This paper proposes an integration of deep neural networks and Lagrangian duality to capture the physical and operational constraints of the AC Optimal Power Flow and produces highly accurate approximations whose costs are within 0.01% of optimality.