# A Unified Algorithm for Stochastic Path Problems

@article{Dann2022AUA,
title={A Unified Algorithm for Stochastic Path Problems},
author={Christoph Dann and Chen-Yu Wei and Julian Zimmert},
journal={ArXiv},
year={2022},
volume={abs/2210.09255}
}
• Published 17 October 2022
• Computer Science
• ArXiv
We study reinforcement learning in stochastic path (SP) problems. The goal in these problems is to maximize the expected sum of rewards until the agent reaches a terminal state. We provide the ﬁrst regret guarantees in this general problem by analyzing a simple optimistic algorithm. Our regret bound matches the best known results for the well-studied special case of stochastic shortest path (SSP) with all non-positive rewards. For SSP, we present an adaptation procedure for the case when the…

## References

SHOWING 1-10 OF 28 REFERENCES

• Computer Science
ICML
• 2020
This work gives an algorithm that guarantees a regret bound of $\widetilde{O}(B_\star |S| \sqrt{|A| K})$ and shows that any learning algorithm must have at least $\Omega$ regret in the worst case.
• Computer Science
COLT
• 2022
This work begins the study of policy optimization for the stochastic shortest path (SSP) problem, a goal-oriented reinforcement learning model that strictly generalizes the finite-horizon model and better captures many applications.
• Computer Science
NeurIPS
• 2021
An algorithm is provided for the finite-horizon setting whose leading term in the regret depends polynomially on the expected cost of the optimal policy and only logarithmically on the horizon and this algorithm is based on a novel reduction from SSP to finite-Horizon MDPs.
• Computer Science
ArXiv
• 2021
This work proposes PSRL-SSP, a simple posterior sampling-based reinforcement learning algorithm for the SSP problem, which is the first such posterior sampling algorithm and outperforms numerically previously proposed optimism-based algorithms.
• Computer Science
NeurIPS
• 2021
It is proved that EB-SSP achieves the minimax regret rate, the first horizon-free regret bound beyond the finite-horizon MDP setting, by closing the gap with the lower bound.
• Computer Science
ICML
• 2022
We introduce two new no-regret algorithms for the stochastic shortest path (SSP) problem with a linear MDP that signiﬁcantly improve over the only existing results of (Vial et al., 2021). Our ﬁrst
• Computer Science
ICML
• 2022
A novel algorithm with Hoeffding-type conﬁdence sets for learning the linear mixture SSP, which provably achieves an near-optimal regret guarantee and proves a lower bound of Ω( dB (cid:63) √ K ) .
• Computer Science
ArXiv
• 2022
A lower bound is established for dynamic regret minimization for goal-oriented reinforcement learning modeled by a non-stationary stochastic shortest path problem with changing cost and transition functions and algorithms are developed that estimate costs and transitions separately.
• Computer Science
ICML
• 2020
UC-SSP is introduced, the first no-regret algorithm in this setting, and a regret bound scaling is proved as $\displaystyle \widetilde{\mathcal{O}}( D S \sqrt{ A D K})$ after any unknown SSP with $S$ states, $A$ actions, positive costs and SSP-diameter $D$, defined as the smallest expected hitting time from any starting state to the goal.
• Computer Science
AAAI
• 2021
This work extends reinforcement learning algorithms to this setting, based on least-squares estimation of the unknown reward, for both the known and unknown transition model cases, and study the performance of these algorithms by analyzing their regret.