Reinforcement Learning in the Wild: Scalable RL Dispatching Algorithm Deployed in Ridehailing Marketplace

  title={Reinforcement Learning in the Wild: Scalable RL Dispatching Algorithm Deployed in Ridehailing Marketplace},
  author={Soheil Sadeghi Eshkevari and Xiaocheng Tang and Zhiwei Qin and Jinhan Mei and Cheng Zhang and Qianying Meng and Jia Xu},
  journal={Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
  • S. S. EshkevariXiaocheng Tang Jia Xu
  • Published 10 February 2022
  • Computer Science
  • Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
In this study, a scalable and real-time dispatching algorithm based on reinforcement learning is proposed and for the first time, is deployed in large scale. Current dispatching methods in ridehailing platforms are dominantly based on myopic or rule-based non-myopic approaches. Reinforcement learning enables dispatching policies that are informed of historical data and able to employ the learned information to optimize returns of expected future trajectories. Previous studies in this field… 

Figures and Tables from this paper

Reinforcement learning for ridesharing: An extended survey



A Deep Value-network Based Approach for Multi-Driver Order Dispatching

This work proposes a deep reinforcement learning based solution for order dispatching and conducts large scale online A/B tests on DiDi's ride-dispatching platform to show that the proposed method achieves significant improvement on both total driver income and user experience related metrics.

Real-world Ride-hailing Vehicle Repositioning using Deep Reinforcement Learning

Value Function is All You Need: A Unified Learning Framework for Ride Hailing Platforms

This paper proposes a unified value-based dynamic learning framework (V1D3) for tackling the tasks of order dispatching and vehicle repositioning, and proposes a novel periodic ensemble method combining the fast online learning with a large-scale offline training scheme that leverages the abundant historical driver trajectory data.

Variance aware reward smoothing for deep reinforcement learning

Gradient temporal-difference learning algorithms

We present a new family of gradient temporal-difference (TD) learning methods with function approximation whose complexity, both in terms of memory and per-time-step computation, scales linearly with

Dynamic pricing and matching in ride‐hailing platforms

This work provides a review of matching and DP techniques in ride‐hailing, and shows that they are critical for providing an experience with low waiting time for both riders and drivers, and links the two levers together by studying a pool‐matching mechanism that varies rider waiting and walking before dispatch.

Maximum Weight Online Matching with Deadlines

This work provides a randomized 0.25-competitive algorithm for matching agents who arrive at a marketplace over time and leave after d time periods and shows that a batching algorithm, which computes a maximum-weighted matching every (d+1) periods, is 0.279-competitive.

Addressing the minimum fleet problem in on-demand urban mobility

An optimal computationally efficient solution to the problem of finding the minimum taxi fleet size using a vehicle-sharing network is presented and a nearly optimal solution amenable to real-time implementation is presented.

Adam: A Method for Stochastic Optimization

This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.