Interaction-Grounded Learning with Action-inclusive Feedback

@article{Xie2022InteractionGroundedLW,
  title={Interaction-Grounded Learning with Action-inclusive Feedback},
  author={Tengyang Xie and Akanksha Saran and Dylan J. Foster and Lekan Molu and Ida Momennejad and Nan Jiang and Paul Mineiro and John Langford},
  journal={ArXiv},
  year={2022},
  volume={abs/2206.08364}
}
Consider the problem setting of Interaction-Grounded Learning (IGL), in which a learner’s goal is to optimally interact with the environment with no explicit reward to ground its policies. The agent observes a context vector, takes an action, and receives a feedback vector, using this information to effectively optimize a policy with respect to a latent reward function. Prior analyzed approaches fail when the feedback vector contains the action, which significantly limits IGL’s success in many… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 53 REFERENCES

Interaction-Grounded Learning

TLDR
It is shown that in an InteractionGrounded Learning setting, with certain natural assumptions, a learner can discover the latent reward and ground its policy for successful interaction.

Interactive Learning from Activity Description

TLDR
A novel interactive learning protocol that enables training request-fulfilling agents by verbally describing their activities to achieve competitive success rates without requiring the teaching agent to be able to demonstrate the desired behavior using the learning agent’s actions.

The EMPATHIC Framework for Task Learning from Implicit Human Feedback

TLDR
This work demonstrates a novel data-driven framework for learning from implicit human feedback, EMPATHIC, and trains a deep neural network on its ability to improve the policy of an agent in the training task using live human facial reactions and transfer to a novel domain in which it evaluates robot manipulation trajectories.

Accelerated Robot Learning via Human Brain Signals

TLDR
This work proposes a method that uses evaluative feedback obtained from human brain signals measured via scalp EEG to accelerate RL for robotic agents in sparse reward settings, and achieves a stable obstacle-avoidance policy with high success rate.

Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning

TLDR
The algorithm provably explores the environment with sample complexity scaling polynomially in the number of latent states and the time horizon, and with no dependence on the size of the observation space, which could be infinitely large, which enables sample-efficient global policy optimization for any reward function.

Provably efficient RL with Rich Observations via Latent State Decoding

TLDR
This work demonstrates how to estimate a mapping from the observations to latent states inductively through a sequence of regression and clustering steps inductively and uses it to construct good exploration policies.

Calibration-Free BCI Based Control

TLDR
A method is proposed that removes the calibration phase, and allows a user to control an agent to solve a sequential task, and infers the interpretation of EEG signals and the task by selecting the hypothesis which best explains the history of interaction.

Understanding Teacher Gaze Patterns for Robot Learning

TLDR
This work studies gaze patterns of human teachers demonstrating tasks to robots and proposes ways in which such patterns can be used to enhance robot learning and provides a foundation for a model of natural human gaze in robot learning from demonstration settings.

Efficient Optimal Learning for Contextual Bandits

TLDR
This work provides the first efficient algorithm with an optimal regret and uses a cost sensitive classification learner as an oracle and has a running time polylog(N), where N is the number of classification rules among which the oracle might choose.

Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of K actions in response to the observed context, and observes the reward only for that
...