• Corpus ID: 239998492

Learning Collaborative Policies to Solve NP-hard Routing Problems

  title={Learning Collaborative Policies to Solve NP-hard Routing Problems},
  author={Minsu Kim and Jinkyoo Park and Joungho Kim},
Recently, deep reinforcement learning (DRL) frameworks have shown potential for solving NP-hard routing problems such as the traveling salesman problem (TSP) without problem-specific expert knowledge. Although DRL can be used to solve complex problems, DRL frameworks still struggle to compete with state-of-the-art heuristics showing a substantial performance gap. This paper proposes a novel hierarchical problem-solving strategy, termed learning collaborative policies (LCP), which can… 
Deep Policy Dynamic Programming for Vehicle Routing Problems
DPDP prioritizes and restricts the DP state space using a policy derived from a deep neural network, which is trained to predict edges from example solutions, which improves the performance of (restricted) DP algorithms, making them competitive to strong alternatives such as LKH, while also outperforming other ‘neural approaches’ for solving TSPs and VRPs with 100 nodes.
A Deep Reinforcement Learning Approach for Solving the Traveling Salesman Problem with Drone
This work proposes an attention encoder-LSTM decoder hybrid model, in which the decoder’s hidden state can represent the sequence of actions made, and empirically demonstrates that such a hybrid model improves upon a purely attention-based model for both solution quality and computational efficiency.


Learning Improvement Heuristics for Solving Routing Problems..
This article proposes a deep reinforcement learning framework to learn the improvement heuristics for routing problems, and designs a self-attention-based deep architecture as the policy network to guide the selection of the next solution.
Reinforcement Learning for Solving the Vehicle Routing Problem
This work presents an end-to-end framework for solving the Vehicle Routing Problem (VRP) using reinforcement learning, and demonstrates how this approach can handle problems with split delivery and explore the effect of such deliveries on the solution quality.
Attention, Learn to Solve Routing Problems!
A model based on attention layers with benefits over the Pointer Network is proposed and it is shown how to train this model using REINFORCE with a simple baseline based on a deterministic greedy rollout, which is more efficient than using a value function.
Learning Heuristics for the TSP by Policy Gradient
The neural combinatorial optimization framework is extended to solve the traveling salesman problem (TSP) and the performance of the proposed framework alone is generally as good as high performance heuristics (OR-Tools).
A Learning-based Iterative Method for Solving Vehicle Routing Problems
This paper presents the first learning based approach for CVRP that is efficient in solving speed and at the same time outperforms OR methods, and achieves the new state-of-the-art results on CVRp.
Learning Combinatorial Optimization Algorithms over Graphs
This paper proposes a unique combination of reinforcement learning and graph embedding that behaves like a meta-algorithm that incrementally constructs a solution, and the action is determined by the output of agraph embedding network capturing the current state of the solution.
Learning to Perform Local Rewriting for Combinatorial Optimization
This paper proposes NeuRewriter, a policy to pick heuristics and rewrite the local components of the current solution to iteratively improve it until convergence, which captures the general structure of combinatorial problems and shows strong performance in three versatile tasks.
Neural Combinatorial Optimization with Reinforcement Learning
A framework to tackle combinatorial optimization problems using neural networks and reinforcement learning, and Neural Combinatorial Optimization achieves close to optimal results on 2D Euclidean graphs with up to 100 nodes.
Multi-Decoder Attention Model with Embedding Glimpse for Solving Vehicle Routing Problems
A novel deep reinforcement learning method to learn construction heuristics for vehicle routing problems and proposes an Embedding Glimpse layer in MDAM based on the recursive nature of construction which can improve the quality of each policy by providing more informative embeddings.
Chip Placement with Deep Reinforcement Learning
This work presents a learning-based approach to chip placement, and shows that, in under 6 hours, this method can generate placements that are superhuman or comparable on modern accelerator netlists, whereas existing baselines require human experts in the loop and take several weeks.