A Study of First-Passage Time Minimization via Q-Learning in Heated Gridworlds

@article{Larchenko2021ASO,
  title={A Study of First-Passage Time Minimization via Q-Learning in Heated Gridworlds},
  author={Maria A. Larchenko and Pavel Osinenko and Grigory Yaremenko and Vladimir V. Palyulin},
  journal={IEEE Access},
  year={2021},
  volume={9},
  pages={159349-159363}
}
Optimization of first-passage times is required in applications ranging from nanobots navigation to market trading. In such settings, one often encounters unevenly distributed noise levels across the environment. We extensively study how a learning agent fares in 1- and 2- dimensional heated gridworlds with an uneven temperature distribution. The results show certain bias effects in agents trained via simple tabular Q-learning, SARSA, Expected SARSA and Double Q-learning. Namely, the state… 

References

SHOWING 1-10 OF 55 REFERENCES
Zermelo's problem: Optimal point-to-point navigation in 2D turbulent flows using Reinforcement Learning
TLDR
This work investigates Zermelo's problem by using a Reinforcement Learning (RL) approach for the case of a vessel that has a slip velocity with fixed intensity, Vs, but variable direction and navigating in a 2D turbulent sea, and shows how the RL approach is able to take advantage of the flow properties in order to reach the target.
Taming the Noise in Reinforcement Learning via Soft Updates
TLDR
G-learning is proposed, a new off-policy learning algorithm that regularizes the noise in the space of optimal actions by penalizing deterministic policies at the beginning of the learning, which enables naturally incorporating prior distributions over optimal actions when available.
A Q-Learning Approach to Flocking With UAVs in a Stochastic Environment
TLDR
Simulation results demonstrate the feasibility of the proposed learning approach at enabling agents to learn how to flock in a leader-follower topology, while operating in a nonstationary stochastic environment.
Flow Navigation by Smart Microswimmers via Reinforcement Learning
TLDR
The potential of reinforcement learning algorithms to model adaptive behavior in complex flows is illustrated and paves the way towards the engineering of smart microswimmers that solve difficult navigation problems.
Reinforcement Learning with Combinatorial Actions: An Application to Vehicle Routing
TLDR
This work develops a framework for value-function-based deep reinforcement learning with a combinatorial action space, in which the action selection problem is explicitly formulated as a mixed-integer optimization problem.
Q-learning
TLDR
This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.
Double Q-learning
TLDR
An alternative way to approximate the maximum expected value for any set of random variables is introduced and the obtained double estimator method is shown to sometimes underestimate rather than overestimate themaximum expected value.
On-line Q-learning using connectionist systems
TLDR
Simulations show that on-line learning algorithms are less sensitive to the choice of training parameters than backward replay, and that the alternative update rules of MCQ-L and Q( ) are more robust than standard Q-learning updates.
Learning to soar in turbulent environments
TLDR
This work simulates the atmospheric boundary layer by numerical models of turbulent convective flow and combines them with model-free, experience-based, reinforcement learning algorithms to train the gliders and identifies those sensorimotor cues that permit effective control over soaring in turbulent environments.
Reinforcement learning with artificial microswimmers
TLDR
This work uses a real-time control of self-thermophoretic active particles to demonstrate the solution of a simple standard navigation problem under the inevitable influence of Brownian motion at these length scales, and shows that, with external control, collective learning is possible.
...
...