Viktor Zhumatiy

Learn More
This paper describes quasi-online reinforcement learning: while a robot is exploring its environment, in the background a probabilistic model of the environment is built on the fly as new experiences arrive; the policy is trained concurrently based on this model using an anytime algorithm. Prioritized sweeping, directed exploration, and transformed reward(More)
IDSIA was founded by the Fondazione Dalle Molle per la Qualità della Vita and is affiliated with both the Università della Svizzera italiana (USI) and the Scuola unversitaria professionale della Svizzera italiana (SUPSI). Abstract Given is a search problem or a sequence of search problems, as well as a set of potentially useful search algorithms. We propose(More)
— It is difficult to apply traditional reinforcement learning algorithms to robots, due to problems with large and continuous domains, partial observability, and limited numbers of learning experiences. This paper deals with these problems by combining: 1. reinforcement learning with memory, implemented using an LSTM recurrent neural network whose inputs(More)
We address the problem of autonomously learning controllers for vision-capable mobile robots. We extend McCallum's (1995) Nearest-Sequence Memory algorithm to allow for general metrics over state-action trajectories. We demonstrate the feasibility of our approach by successfully running our algorithm on a real mobile robot. The algorithm is novel and unique(More)
Learning and planning control is hard. The search space of traditional planners consists of sequences of primitive actions. To exploit reusable subsequences and other algorithmic regularities, however, we should instead search the general space of programs that compute action sequences. Such programs may invoke very fast " thinking actions " consuming only(More)
  • 1