• Corpus ID: 230435959

Reinforcement Learning for Flexibility Design Problems

  title={Reinforcement Learning for Flexibility Design Problems},
  author={Yehua Dennis Wei and Lei Zhang and Ruiyi Zhang and Shijing Si and Hao Zhang and Lawrence Carin},
Flexibility design problems are a class of problems that appear in strategic decision-making across industries, where the objective is to design a (e.g., manufacturing) network that affords flexibility and adaptivity. The underlying combinatorial nature and stochastic objectives make flexibility design problems challenging for standard optimization methods. In this paper, we develop a reinforcement learning (RL) framework for flexibility design problems. Specifically, we carefully design… 

Figures and Tables from this paper

MB-CIM: A Multi-round Budgeted Competitive Influence Maximization

A tree- approximate game-theoretical framework is proposed and the new measurement as a dynamic node weight is introduced and demonstrated through simulation that this approach works well in a multi-round and learning-based CIM problem.

A Budged Framework to Model a Multi-round Competitive Influence Maximization Problem

  • Nadia NiknamiJie Wu
  • Computer Science
    ICC 2022 - IEEE International Conference on Communications
  • 2022
This paper proposes a tree-approximate game-theoretical framework and introduces the new measurement as a dynamic weight that allows us to isolate the most influential member with a high degree of accuracy and demonstrates the effectiveness of the approach in solving a multi-round, learning-based CIM problem.

Controlling Graph Dynamics with Reinforcement Learning and Graph Neural Networks

This work designs a novel tractable scheme to control dynamical processes on temporal graphs and successfully applies its approach to two popular problems that fall into this framework: prioritizing which nodes should be tested in order to curb the spread of an epidemic, and influence maximization on a graph.



Reinforcement Learning for Solving the Vehicle Routing Problem

This work presents an end-to-end framework for solving the Vehicle Routing Problem (VRP) using reinforcement learning, and demonstrates how this approach can handle problems with split delivery and explore the effect of such deliveries on the solution quality.

Process flexibility design in heterogeneous and unbalanced networks: A stochastic programming approach

ABSTRACT Most studies of process flexibility design have focused on homogeneous networks, whereas production systems in practice usually differ in many aspects, such as plant efficiency and product

Neural Combinatorial Optimization with Reinforcement Learning

A framework to tackle combinatorial optimization problems using neural networks and reinforcement learning, and Neural Combinatorial Optimization achieves close to optimal results on 2D Euclidean graphs with up to 100 nodes.

Trust Region Policy Optimization

A method for optimizing control policies, with guaranteed monotonic improvement, by making several approximations to the theoretically-justified scheme, called Trust Region Policy Optimization (TRPO).

Proximal Policy Optimization Algorithms

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective

Machine Learning for Combinatorial Optimization: a Methodological Tour d'Horizon

Parameter Space Noise for Exploration

This work demonstrates that RL with parameter noise learns more efficiently than traditional RL with action space noise and evolutionary strategies individually through experimental comparison of DQN, DDPG, and TRPO on high-dimensional discrete action environments as well as continuous control tasks.

Policy Gradient Methods for Reinforcement Learning with Function Approximation

This paper proves for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy.

Attention, Learn to Solve Routing Problems!

A model based on attention layers with benefits over the Pointer Network is proposed and it is shown how to train this model using REINFORCE with a simple baseline based on a deterministic greedy rollout, which is more efficient than using a value function.