Autonomous task achievement by space robot based on q-learning with environment recognition
- K. Senda, T. Matsumoto, Y. Okano, S. Mano, S. Fujii
- AIAA Guidance, Navigation, and Control Conference…
A multiresolution state-space discretization method is developed for the episodic unsupervised learning method of Q-Learning. In addition, a genetic algorithm is used periodically during learning to approximate the action-value function. Policy iteration is added as a stopping criterion for the algorithm. For large scale problems Q-Learning often suffers from the Curse of Dimensionality due to large numbers of possible stateaction pairs. This paper develops a method whereby a statespace is adaptively discretized by progressively finer grids around the areas of interest within the state or learning space. Policy iteration is added to prevent unnecessary episodes at each level of discretization once the learning has converged. Utility of the method is demonstrated with application to the problem of a morphing airfoil with two morphing parameters (two state variables). By setting the multiresolution method to define the area of interest by the goal the agent seeks, it is shown that this method can learn a specific goal within ±0.002, while reducing the total number episodes needed to converge by 85% from the allotted total possible episodes. It is also shown that a good approximation of the action-value function is produced with 80% agreement between the tabulated and approximated policy, though empirically the approximated policy appears to be superior.