• Corpus ID: 238407912

Nested Policy Reinforcement Learning

  title={Nested Policy Reinforcement Learning},
  author={Aishwarya Mandyam and Andrew Jones and Krzysztof Laudanski and Barbara E. Engelhardt},
Off-policy reinforcement learning (RL) has proven to be a powerful framework for guiding agents’ actions in environments with stochastic rewards and unknown or noisy state dynamics. In many real-world settings, these agents must operate in multiple environments, each with slightly different dynamics. For example, we may be interested in developing policies to guide medical treatment for patients with and without a given disease, or policies to navigate curriculum design for students with and… 

Figures and Tables from this paper


Apprenticeship learning via inverse reinforcement learning
This work thinks of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and gives an algorithm for learning the task demonstrated by the expert, based on using "inverse reinforcement learning" to try to recover the unknown reward function.
Tree-Based Batch Mode Reinforcement Learning
Within this framework, several classical tree-based supervised learning methods and two newly proposed ensemble algorithms, namely extremely and totally randomized trees, are described and found that the ensemble methods based on regression trees perform well in extracting relevant information about the optimal control policy from sets of four-tuples.
Methods for Reinforcement Learning in Clinical Decision Support
A framework for clinician-in-loop decision support for critical care interventions is developed, and methods for Pareto-optimal reinforcement learning are integrated with known procedural constraints in order to consolidate multiple, often conflicting, clinical goals and produce a flexible optimized ordering policy.
Multitask Learning
  • R. Caruana
  • Computer Science
    Encyclopedia of Machine Learning and Data Mining
  • 1998
Suggestions for how to get the most out of multitask learning in artificial neural nets are presented, an algorithm forMultitask learning with case-based methods like k-nearest neighbor and kernel regression is presented, and an algorithms for multitasklearning in decision trees are sketched.
Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method
NFQ, an algorithm for efficient and effective training of a Q-value function represented by a multi-layer perceptron, is introduced and it is shown empirically, that reasonably few interactions with the plant are needed to generate control policies of high quality.
A Unified Approach to Interpreting Model Predictions
A unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations), which unifies six existing methods and presents new methods that show improved computational performance and/or better consistency with human intuition than previous approaches.
PyTorch: An Imperative Style, High-Performance Deep Learning Library
This paper details the principles that drove the implementation of PyTorch and how they are reflected in its architecture, and explains how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance.
Hierarchical Bayes Models: A Practitioners Guide
The promise of HB models is illustrated and an introduction to their computation is provided to provide an understanding of how these models are implemented.
On Line Learning In Neural Networks
The on line learning in neural networks is universally compatible with any devices to read, so you can get the most less latency time to download any of the authors' books like this one.
A database to support development and evaluation of intelligent intensive care monitoring
  • G. Moody, R. Mark
  • Medicine, Computer Science
    Computers in Cardiology 1996
  • 1996
The MIMIC (Multi-parameter Intelligent Monitoring for Intensive Care) Database is intended to meet the needs of automated decision support systems and to make the database available to other researchers shortly thereafter.