Corpus ID: 237605525

Reinforcement Learning Under Algorithmic Triage

@article{Straitouri2021ReinforcementLU,
  title={Reinforcement Learning Under Algorithmic Triage},
  author={Eleni Straitouri and Adish Kumar Singla and Vahid Balazadeh Meresht and Manuel Gomez-Rodriguez},
  journal={ArXiv},
  year={2021},
  volume={abs/2109.11328}
}
Methods to learn under algorithmic triage have predominantly focused on supervised learning settings where each decision, or prediction, is independent of each other. Under algorithmic triage, a supervised learning model predicts a fraction of the instances and humans predict the remaining ones. In this work, we take a first step towards developing reinforcement learning models that are optimized to operate under algorithmic triage. To this end, we look at the problem through the framework of… Expand

Figures and Tables from this paper

References

SHOWING 1-10 OF 58 REFERENCES
Towards Deployment of Robust Cooperative AI Agents: An Algorithmic Framework for Learning Adaptive Policies
TLDR
This work develops an algorithmic framework for learning adaptive policies that relies on observing the user’s actions to make inferences about the users’ type and adapting the policy to facilitate efficient cooperation, and proposes two concrete algorithms for computing policies that automatically adapt to the user in the test phase. Expand
An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning
TLDR
It is shown that varying the emphasis of linear TD(γ)'s updates in a particular way causes its expected update to become stable under off-policy training. Expand
Learning Policy Representations in Multiagent Systems
TLDR
A general learning framework for modeling agent behavior in any multiagent system using only a handful of interaction data is proposed and a novel objective inspired by imitation learning and agent identification is constructed and an algorithm for unsupervised learning of representations of agent policies is designed. Expand
Safe option-critic: learning safety in the option-critic architecture
TLDR
This work considers a behaviour as safe that avoids regions of state space with high uncertainty in the outcomes of actions and proposes an optimization objective that learns safe options by encouraging the agent to visit states with higher behavioural consistency. Expand
Efficient Model Learning from Joint-Action Demonstrations for Human-Robot Collaborative Tasks
TLDR
Results indicate that learning human user models from joint-action demonstrations and encoding them in a MOMDP formalism can support effective teaming in human-robot collaborative tasks. Expand
Shared Autonomy via Deep Reinforcement Learning
TLDR
This paper uses human-in-the-loop reinforcement learning with neural network function approximation to learn an end-to-end mapping from environmental observation and user input to agent action, with task reward as the only form of supervision. Expand
Temporal abstraction in reinforcement learning
TLDR
A general framework for prediction, control and learning at multiple temporal scales, and the way in which multi-time models can be used to produce plans of behavior very quickly, using classical dynamic programming or reinforcement learning techniques is developed. Expand
Teaching Inverse Reinforcement Learners via Features and Demonstrations
TLDR
A teaching scheme is suggested in which the expert can decrease the teaching risk by updating the learner's worldview, and thus ultimately enable her to find a near-optimal policy. Expand
Learning to Collaborate in Markov Decision Processes
TLDR
It is shown that sub-linear regret of agent A1 further implies near-optimality of the agents' joint return for MDPs that manifest the properties of a smooth game. Expand
Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
TLDR
It is shown that options enable temporally abstract knowledge and action to be included in the reinforcement learning frame- work in a natural and general way and may be used interchangeably with primitive actions in planning methods such as dynamic pro- gramming and in learning methodssuch as Q-learning. Expand
...
1
2
3
4
5
...