Learn More
This paper presents the main findings from a collaborative community/university research project in Canada. The goal of the project was to improve access to community health information, and in so doing, enhance our knowledge of the development of community health information resources and community/university collaboration. The project built on a rich(More)
Reinforcement learning (RL) is designed to learn optimal control policies from unsupervised interactions with the environment. Many successful RL algorithms have been developed, however, none of them can efficiently tackle problems with high-dimensional state spaces due to the "curse of dimensionality," and so their applicability to real-world scenarios is(More)
Approximate value iteration methods for reinforcement learning (RL) generalize experience from limited samples across large state-action spaces. The function approximators used in such methods typically introduce errors in value estimation which can harm the quality of the learned value functions. We present a new batch-mode, off-policy, approximate value(More)
In-class meeting location: BEL 632 PAD Conference Room Office Hours: For any given in-class meeting, 2:00pm-3:30pm the day before the class day; or by appointment. Public management is complex and requires a sophisticated appreciation for the interconnections that sustain it. The jurisdictional, political, economic, and legal contexts shape its contours and(More)
Approximate value iteration (AVI) is a widely used technique in reinforcement learning. Most AVI methods do not take full advantage of the sequential relationship between samples within a trajectory in deriving value estimates, due to the challenges in dealing with the inherent bias and variance in the n-step returns. We propose a bounding method which uses(More)
Fitted Q-Iteration (FQI) is a popular approximate value iteration (AVI) approach that makes effective use of off-policy data. FQI uses a 1-step return value update which does not exploit the sequential nature of trajectory data. Complex returns (weighted averages of the n-step returns) use tra-jectory data more effectively, but have not been used in an AVI(More)
Automatic learning of control policies is becoming increasingly important to allow autonomous agents to operate alongside, or in place of, humans in dangerous and fast-paced situations. Reinforcement learning (RL), including genetic policy search algorithms, comprise a promising technology area capable of learning such control policies. Unfortunately, RL(More)