The Concept of Criticality in Reinforcement Learning

  title={The Concept of Criticality in Reinforcement Learning},
  author={Yitzhak Spielberg and Amos Azaria},
  journal={2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI)},
  • Yitzhak Spielberg, Amos Azaria
  • Published 16 October 2018
  • Computer Science
  • 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI)
This paper introduces a novel idea in human-aided reinforcement learning - the concept of criticality. The criticality of a state indicates how much the choice of action in that particular state influences the expected return. In order to develop an intuition for the concept, we present examples of plausible criticality functions in multiple environments. Furthermore, we formulate a practical application of criticality in reinforcement learning: the criticality-based varying stepnumber… 

Criticality-based Varying Step-number Algorithm for Reinforcement Learning

A criticality-based varying step number algorithm (CVS) is formulated — a flexible step number algorithms that utilizes the criticality function provided by a human, or learned directly from the environment.

Criticality-Based Advice in Reinforcement Learning (Student Abstract)

An approach to advice-based RL is presented, in which the human’s role is not limited to giving advice in chosen states, but also includes hinting a-priori, before the learning procedure, inWhich sub-domains of the state space the agent might require more advice.

An agent for learning new natural language commands

The Learning by Instruction Agent (LIA), the first virtual assistant, for an email domain, that is capable of learning how to perform new commands taught by end users in natural language, is introduced.



Deep Reinforcement Learning with Double Q-Learning

This paper proposes a specific adaptation to the DQN algorithm and shows that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several games.

Multi-step Reinforcement Learning: A Unifying Algorithm

A new multi-step action-value algorithm called Q(σ) is studied that unifies and generalizes these existing algorithms, while subsuming them as special cases and an intermediate value of σ is introduced, which results in a mixture of the existing algorithms.

Reinforcement Learning: An Introduction

This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces

An extension of the TAMER framework that leverages the representational power of deep neural networks in order to learn complex tasks in just a short amount of time with a human trainer, and demonstrates its success by using it and just 15 minutes of human-provided feedback to train an agent that performs better than humans on the Atari game of Bowling.

An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning

It is shown that varying the emphasis of linear TD(γ)'s updates in a particular way causes its expected update to become stable under off-policy training.

Reinforcement Learning from Demonstration through Shaping

This paper investigates the intersection of reinforcement learning and expert demonstrations, leveraging the theoretical guarantees provided by reinforcement learning, and using expert demonstrations to speed up this learning by biasing exploration through a process called reward shaping.

Per-decision Multi-step Temporal Difference Learning with Control Variates

The results show that including the control variates can greatly improve performance on both on and off-policy multi-step temporal difference learning tasks.

Policy Shaping: Integrating Human Feedback with Reinforcement Learning

This paper introduces Advise, a Bayesian approach that attempts to maximize the information gained from human feedback by utilizing it as direct policy labels and shows that it can outperform state-of-the-art approaches and is robust to infrequent and inconsistent human feedback.

Integrating reinforcement learning with human demonstrations of varying ability

This work introduces Human-Agent Transfer (HAT), an algorithm that combines transfer learning, learning from demonstration and reinforcement learning to achieve rapid learning and high performance in

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

This work extends the doubly robust estimator for bandits to sequential decision-making problems, which gets the best of both worlds: it is guaranteed to be unbiased and can have a much lower variance than the popular importance sampling estimators.