Solving Reward-Collecting Problems with UAVs: A Comparison of Online Optimization and Q-Learning

  title={Solving Reward-Collecting Problems with UAVs: A Comparison of Online Optimization and Q-Learning},
  author={Yixuan Liu and Chrysafis Vogiatzis and Ruriko Yoshida and Erich Morman},
  journal={Journal of Intelligent \& Robotic Systems},
Uncrewed autonomous vehicles (UAVs) have made significant contributions to reconnaissance and surveillance missions in past US military campaigns. As the prevalence of UAVs increases, there has also been improvements in counter-UAV technology that makes it difficult for them to successfully obtain valuable intelligence within an area of interest. Hence, it has become important that modern UAVs can accomplish their missions while maximizing their chances of survival. In this work, we… 


Reinforcement Learning: A Tutorial Survey and Recent Advances
  • A. Gosavi
  • Computer Science
    INFORMS J. Comput.
  • 2009
This overview of reinforcement learning is aimed at uncovering the mathematical roots of this science so that readers gain a clear understanding of the core concepts and are able to use them in their own research.
Online convex optimization and no-regret learning: Algorithms, guarantees and applications
This tutorial paper is to provide a gentle introduction to online optimization and learning algorithms that are asymptotically optimal in hindsight - i.e., they approach the performance of a virtual algorithm with unlimited computational power and full knowledge of the future, a property known as no-regret.
Deliberation for autonomous robots: A survey
Endhost-based shortest path routing in dynamic networks: An online learning approach
This work gives a simple algorithm based on decoupled probing and routing, whose regret is only constant in time, and extends this solution to support multi-path probing and cooperative learning between multiple sources, where it shows an inversely proportional decay in regret wrt the probing rate.
Playing Atari with Deep Reinforcement Learning
This work presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning, which outperforms all previous approaches on six of the games and surpasses a human expert on three of them.
Approximate dynamic programming: solving the curses of dimensionality
This book provides detailed coverage of modelling decision processes under uncertainty, robustness, designing and estimating value function approximations, choosing effective step-size rules, and convergence issues and is an excellent textbook for advanced undergraduate and beginning graduate students.
Online portfolio selection: A survey
A comprehensive survey and a structural understanding of online portfolio selection techniques published in the literature is provided and the relationship of these algorithms with the capital growth theory is discussed so as to better understand the similarities and differences of their underlying trading ideas.
Energy-Efficient Scheduling for Real-Time Systems Based on Deep Q-Learning Model
An energy-efficient scheduling scheme based on deep Q-learning model is proposed for periodic tasks in real-time systems (DQL-EES) and demonstrated that the proposed algorithm can save average more energy than QL-HDS.
Approximate dynamic programming : solving the curses of dimensionality
This book discusses the challenges of dynamic programming, the three curses of dimensionality, and some experimental comparisons of stepsize formulas that led to the creation of ADP for online applications.