Learn More
In this paper we introduce the idea of improving the performance of parametric temporal-difference (TD) learning algorithms by selectively emphasizing or de-emphasizing their updates on different time steps. In particular, we show that varying the emphasis of linear TD(λ)'s updates in a particular way causes its expected update to become stable under(More)
One of the main obstacles to broad application of reinforcement learning methods is the parameter sensitivity of our core learning algorithms. In many large-scale applications, online computation and function approximation represent key strategies in scaling up reinforcement learning algorithms. In this setting, we have effective and reasonably well(More)
BACKGROUND Shared decision-making has been advocated; however there are relatively few studies on physician preferences for, and experiences of, different styles of clinical decision-making as most research has focused on patient preferences and experiences. The objectives of this study were to determine 1) physician preferences for different styles of(More)
Automated feature discovery is a fundamental problem in machine learning. Although classical feature discovery methods do not guarantee optimal solutions in general, it has been recently noted that certain subspace learning and sparse coding problems can be solved efficiently, provided the number of features is not restricted a priori. We provide an(More)
Emphatic algorithms are temporal-difference learning algorithms that change their effective state distribution by selectively emphasizing and de-emphasizing their updates on different time steps. Recent works by Sutton, Mahmood and White (2015), and Yu (2015) show that by varying the emphasis in a particular way, these algorithms become stable and(More)
Imagine that you have made the world's best poker agent You've played millions of games against other bots and won! Now you want to pit the agent against the world's best human players... Problem Poker has a lot of luck In Texas hold'em two player-limit poker: Standard deviation of winnings is 6.0 sb Required precision to distinguish pro and amateur: 0.05(More)
Robust regression and classification are often thought to require non-convex loss functions that prevent scalable, global training. However, such a view neglects the possibility of reformulated training methods that can yield practically solvable alternatives. A natural way to make a loss function more robust to outliers is to truncate loss values that(More)