Learn More
In this paper we introduce the idea of improving the performance of parametric temporal-difference (TD) learning algorithms by selectively emphasizing or de-emphasizing their updates on different time steps. In particular, we show that varying the emphasis of linear TD(λ)'s updates in a particular way causes its expected update to become stable under(More)
One of the main obstacles to broad application of reinforcement learning methods is the parameter sensitivity of our core learning algorithms. In many large-scale applications, online computation and function approximation represent key strategies in scaling up reinforcement learning algorithms. In this setting, we have effective and reasonably well(More)
BACKGROUND Shared decision-making has been advocated; however there are relatively few studies on physician preferences for, and experiences of, different styles of clinical decision-making as most research has focused on patient preferences and experiences. The objectives of this study were to determine 1) physician preferences for different styles of(More)
Automated feature discovery is a fundamental problem in machine learning. Although classical feature discovery methods do not guarantee optimal solutions in general, it has been recently noted that certain subspace learning and sparse coding problems can be solved efficiently, provided the number of features is not restricted a priori. We provide an(More)
Training principles for unsupervised learning are often derived from motivations that appear to be independent of supervised learning. In this paper we present a simple unification of several supervised and unsupervised training principles through the concept of <i>optimal reverse prediction</i>: predict the inputs from the target labels, optimizing both(More)
Emphatic algorithms are temporal-difference learning algorithms that change their effective state distribution by selectively emphasizing and de-emphasizing their updates on different time steps. Recent works by Sutton, Mahmood and White (2015), and Yu (2015) show that by varying the emphasis in a particular way, these algorithms become stable and(More)
Imagine that you have made the world's best poker agent You've played millions of games against other bots and won! Now you want to pit the agent against the world's best human players... Problem Poker has a lot of luck In Texas hold'em two player-limit poker: Standard deviation of winnings is 6.0 sb Required precision to distinguish pro and amateur: 0.05(More)