Temporal abstraction is key to scaling up learning and planning in reinforcement learning. While planning with temporally extended actions is well understood, creating such abstractions autonomously from data has remained challenging. We derive policy gradient theorems for options and propose a new option-critic architecture capable of learning both the… (More)
Temporal abstraction plays a key role in scaling up reinforcement learning algorithms. While learning and planning with given temporally extended actions has been well studied , the topic of how to construct this type of abstraction automatically from data is still open. We propose to use the label propagation algorithm for community detection in order to… (More)
Deep learning has become the state-of-art tool in many applications, but the evaluation and training of deep models can be time-consuming and computationally expensive. The conditional computation approach has been proposed to tackle this problem (Bengio et al., 2013; Davis & Arel, 2013). It operates by selectively activating only parts of the network at a… (More)
Deep learning has become the state-of-art tool in many applications, but the evaluation and training of such models is very time-consuming and expensive. Dropout has been used in order to make the computations sparse (by not involving all units), as well as to regularize the models. In typical dropout, nodes are dropped uniformly at random. Our goal is to… (More)
• From the point of view of absolute optimality, temporal abstractions in reinforcement learning are not necessary. • We propose bounded rationality as a lens through which we can describe the desiderata for constructing temporal abstractions. • We formalize the idea that good options are those which result in fast planning (or inference). the idea of… (More)
We consider the problem of learning and planning in Markov decision processes with temporally extended actions represented in the options framework. We propose to use predictions about the duration of extended actions to represent the state and show that this leads to a compact pre-dictive state representation model independent of the set of primitive… (More)
There is a significant effort towards moving much of the data from the city of Montreal into an Open Data format. In this short paper, we report on a recent initiative to analyze this data using machine learning techniques in the context of a graduate course project. We review the approach, summarize accomplishments, and provide several recommendations for… (More)
We show that when planning with options, the corresponding Bellman operator involves a matrix splitting (Varga 1962). Equivalently, a set of options and a policy over them is shown to specify a matrix preconditioner. A choice of options is therefore a choice of a preconditioned fixed point iteration algorithm.
We consider the problems of learning and planning in Markov decision processes with temporally extended actions represented in the options framework. • We propose to use predictions about the duration of extended actions to represent the state. • We develop a consistent and efficient spectral learning algorithm.