# Efficient Dialog Policy Learning via Positive Memory Retention

@article{Zhao2018EfficientDP,
title={Efficient Dialog Policy Learning via Positive Memory Retention},
author={Rui Zhao and Volker Tresp},
journal={2018 IEEE Spoken Language Technology Workshop (SLT)},
year={2018},
pages={823-830}
}
• Published 2 October 2018
• Computer Science
• 2018 IEEE Spoken Language Technology Workshop (SLT)
This paper is concerned with the training of recurrent neural networks as goal-oriented dialog agents using reinforcement learning. Training such agents with policy gradients typically requires a large amount of samples. However, the collection of the required data in form of conversations between chatbots and human agents is time-consuming and expensive. To mitigate this problem, we describe an efficient policy gradient method using positive memory retention, which significantly increases the…
9 Citations

## Figures and Tables from this paper

### Learning Goal-Oriented Visual Dialog via Tempered Policy Gradient

• Computer Science
2018 IEEE Spoken Language Technology Workshop (SLT)
• 2018
A class of novel temperature-based extensions for policy gradient methods, which are referred to as Tempered Policy Gradients (TPGs), are proposed, which improve the performance of commonly used policy-based dialogue agents by around 5% and helps produce more convincing utterances.

### Curiosity-Driven Experience Prioritization via Density Estimation

• Computer Science
ArXiv
• 2019
A novel Curiosity-Driven Prioritization (CDP) framework to encourage the agent to over-sample those trajectories that have rare achieved goal states and the experimental results show that CDP improves both performance and sample-efficiency of reinforcement learning agents, compared to state-of-the-art methods.

### Mutual Information-based State-Control for Intrinsically Motivated Reinforcement Learning

• Computer Science
ArXiv
• 2020
This work proposes to formulate an intrinsic objective as the mutual information between the goal states and the controllable states, which encourages the agent to take control of its environment.

### Guessing State Tracking for Visual Dialogue

• Computer Science
ECCV
• 2020
A guessing state tracking based guess model for the Guesser, which significantly outperforms previous models, achieves new state-of-the-art, and especially the success rate of guessing 83.3% is approaching the human-level accuracy of 84.4%.

### Related Work to Neural Natural-Language Template Matching

A novel method is proposed which learns how to match a natural language template to the utterances, extracting information from the utterance in the process, and how this approach differs from existing work in neural semantic parsing and sentence matching.

### AutoScale: Energy Efficiency Optimization for Stochastic Edge Inference Using Reinforcement Learning

• Computer Science
2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
• 2020
This paper proposes AutoScale, an adaptive and lightweight execution scaling engine built on the custom-designed reinforcement learning algorithm that continuously learns and selects the most energy efficient inference execution target by considering characteristics of neural networks and available systems in the collaborative cloud-edge execution environment while adapting to stochastic runtime variance.

### Learning Individualized Treatment Rules with Estimated Translated Inverse Propensity Score

• Computer Science
2020 IEEE International Conference on Healthcare Informatics (ICHI)
• 2020
This paper focuses on learning individualized treatment rules (ITRs) to derive a treatment policy that is expected to generate a better outcome for an individual patient, and casts ITRs learning as a contextual bandit problem and minimize the expected risk of the treatment policy.

### Maximum Entropy-Regularized Multi-Goal Reinforcement Learning

• Computer Science
ICML
• 2019
A novel multi-goal RL objective based on weighted entropy is proposed, which encourages the agent to maximize the expected return, as well as to achieve more diverse goals and a maximum entropy-based prioritization framework is developed to optimize the proposed objective.

### Energy-Based Hindsight Experience Prioritization

• Computer Science
CoRL
• 2018
An energy-based framework for prioritizing hindsight experience in robotic manipulation tasks, inspired by the work-energy principle in physics, that hypothesizes that replaying episodes that have high trajectory energy is more effective for reinforcement learning in robotics.

## References

SHOWING 1-10 OF 56 REFERENCES

### Sample-efficient Deep Reinforcement Learning for Dialog Control

• Computer Science
ArXiv
• 2016
This paper presents 3 methods for reducing the number of dialogs required to optimize an RNN-based dialog policy with RL by maintaining a second RNN which predicts the value of the current policy, and to apply experience replay to both networks.

### Learning Goal-Oriented Visual Dialog via Tempered Policy Gradient

• Computer Science
2018 IEEE Spoken Language Technology Workshop (SLT)
• 2018
A class of novel temperature-based extensions for policy gradient methods, which are referred to as Tempered Policy Gradients (TPGs), are proposed, which improve the performance of commonly used policy-based dialogue agents by around 5% and helps produce more convincing utterances.

### Efficient Exploration for Dialogue Policy Learning with BBQ Networks & Replay Buffer Spiking

• Computer Science
• 2016
This work introduces an exploration technique based on Thompson sampling, drawing Monte Carlo samples from a Bayes-by-backprop neural network, demonstrating marked improvement over common approaches such as -greedy and Boltzmann exploration.

### Efficient Exploration for Dialog Policy Learning with Deep BBQ Networks \& Replay Buffer Spiking

• Computer Science
ArXiv
• 2016
This work introduces an exploration technique based on Thompson sampling, drawing Monte Carlo samples from a Bayes-by-backprop neural network, demonstrating marked improvement over common approaches such as $\epsilon$-greedy and Boltzmann exploration.

### Sample-efficient Actor-Critic Reinforcement Learning with Supervised Data for Dialogue Management

• Computer Science
SIGDIAL Conference
• 2017
A practical approach to learn deep RL-based dialogue policies and demonstrate their effectiveness in a task-oriented information seeking domain is demonstrated.

### BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems

• Computer Science
AAAI
• 2018
A new algorithm is presented that significantly improves the efficiency of exploration for deep Q-learning agents in dialogue systems and shows that spiking the replay buffer with experiences from just a few successful episodes can make Q- learning feasible when it might otherwise fail.

### Agent-Aware Dropout DQN for Safe and Efficient On-line Dialogue Policy Learning

• Computer Science
EMNLP
• 2017
A novel agent-aware dropout Deep Q-Network (AAD-DQN) is proposed to address the problem of when to consult the teacher and how to learn from the teacher’s experiences and can significantly improve both safety and efficiency of on-line policy optimization compared to other companion learning approaches.

### The Reactor: A Sample-Efficient Actor-Critic Architecture

• Computer Science
ArXiv
• 2017
A new reinforcement learning agent, called Reactor (for Retraceactor), based on an off-policy multi-step return actor-critic architecture that is sample-efficient thanks to the use of memory replay, and numerical efficient since it uses multi- step returns.

### Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning

• Computer Science
SIGDIAL Conference
• 2016
This paper presents an end-to-end framework for task-oriented dialog systems using a variant of Deep Recurrent Q-Networks (DRQN). The model is able to interface with a relational database and jointly

### Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning

• Computer Science
2017 IEEE International Conference on Computer Vision (ICCV)
• 2017
This work poses a cooperative ‘image guessing’ game between two agents who communicate in natural language dialog so that Q-BOT can select an unseen image from a lineup of images and shows the emergence of grounded language and communication among ‘visual’ dialog agents with no human supervision.