AgentGraph: Toward Universal Dialogue Management With Structured Deep Reinforcement Learning

  title={AgentGraph: Toward Universal Dialogue Management With Structured Deep Reinforcement Learning},
  author={Lu Chen and Z. Chen and Bowen Tan and Sishan Long and Milica Ga{\vs}i{\'c} and Kai Yu},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
  • Lu Chen, Z. Chen, Kai Yu
  • Published 27 May 2019
  • Computer Science
  • IEEE/ACM Transactions on Audio, Speech, and Language Processing
Dialogue policy plays an important role in task-oriented spoken dialogue systems. It determines how to respond to users. The recently proposed deep reinforcement learning (DRL) approaches have been used for policy optimization. However, these deep models are still challenging for two reasons: first, many DRL-based policies are not sample efficient; and second, most models do not have the capability of policy transfer between different domains. In this paper, we propose a universal framework… 

Dialogue Strategy Adaptation to New Action Sets Using Multi-Dimensional Modelling

This work exploits pre-trained task-independent policies to speed up training for an extended task-specific action set, in which the single summary action for requesting a slot is replaced by multiple slot-specific request actions.

Memory Attention Neural Network for Multi-domain Dialogue State Tracking

A novel Memory Attention State Tracker that considers ontologies as prior knowledge and utilizes Memory Network to store such information and achieves a compatible joint accuracy on MultiWoz 2.0 dataset.

Variational Denoising Autoencoders and Least-Squares Policy Iteration for Statistical Dialogue Managers

This work proposes a novel approach based on the incremental, sample-efficient Least-Squares Policy Iteration (LSPI) algorithm, which is trained on compact, fixed-size dialogue state encodings, obtained from deep Variational Denoising Autoencoders (VDAE).

Coherent Dialog Generation with Query Graph

This article proposes to leverage a new knowledge source, web search session data, to facilitate hierarchical knowledge sequence planning, which determines a sketch of a multi-turn dialog, and devise a heterogeneous graph neural network to incorporate neighbouring vertex information, or possible future RL action information, into each vertex (as an RL action) representation.

Efficient Context and Schema Fusion Networks for Multi-Domain Dialogue State Tracking

A novel context and schema fusion network is proposed to encode the dialogue Context and schema graph by using internal and external attention mechanisms and results show that this approach can outperform strong baselines.

Imperfect also Deserves Reward: Multi-Level and Sequential Reward Modeling for Better Dialog Management

A multi-level reward modeling approach that factorizes a reward into a three-level hierarchy: domain, act, and slot is proposed that can provide more accurate and explainable reward signals for state-action pairs in dialogs.

OPAL: Ontology-Aware Pretrained Language Model for End-to-End Task-Oriented Dialogue

An ontology-aware pretrained language model (OPAL) for end-to-end task-oriented dialogue (TOD) with an exciting boost and competitive performance even without any TOD data on CamRest676 and MultiWOZ benchmarks is presented.

How to Evaluate Single-Round Dialogues Like Humans: An Information-Oriented Metric

An information-oriented framework to simulate human subjective evaluation is designed and implemented, which is more relevant than the existing methods of dialogue evaluation to human subjective judgment and effective in dialogue selection and model evaluation.

Dialogue Management in Conversational Systems: A Review of Approaches, Challenges, and Opportunities

This article studies dialogue management from an in-depth design perspective, discusses the state-of-the-art approaches, identifies their recent advances and challenges, and provides an outlook on future research directions.

Diversifying Task-oriented Dialogue Response Generation with Prototype Guided Paraphrasing.

A prototype-based, paraphrasing neural network, called P2-Net, is introduced, which aims to enhance quality of the responses in terms of both precision and diversity, and achieves a significant improvement in diversity while preserving the semantics of responses.



Policy Adaptation for Deep Reinforcement Learning-Based Dialogue Management

Simulation experiments showed that MADP can significantly speed up the policy learning and facilitate policy adaptation.

Feudal Reinforcement Learning for Dialogue Management in Large Domains

A novel Dialogue Management architecture, based on Feudal RL, is proposed, which decomposes the decision into two steps; a first step where a master policy selects a subset of primitive actions, and a secondstep where a primitive action is chosen from the selected subset.

Policy Networks with Two-Stage Training for Dialogue Systems

This paper shows that, on summary state and action spaces, deep Reinforcement Learning (RL) outperforms Gaussian Processes methods and shows that a deep RL method based on an actor-critic architecture can exploit a small amount of data very efficiently.

Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning

This paper addresses the travel planning task by formulating the task in the mathematical framework of options over Markov Decision Processes (MDPs), and proposing a hierarchical deep reinforcement learning approach to learning a dialogue manager that operates at different temporal scales.

A Benchmarking Environment for Reinforcement Learning Based Task Oriented Dialogue Management

A set of challenging simulated environments for dialogue model development and evaluation is proposed and a number of representative parametric algorithms, namely deep reinforcement learning algorithms - DQN, A2C and Natural Actor-Critic are investigated and compared to a non-parametric model, GP-SARSA.

Sub-domain Modelling for Dialogue Management with Hierarchical Reinforcement Learning

A new method for hierarchical reinforcement learning using the option framework is proposed and it is shown that the proposed architecture learns faster and arrives at a better policy than the existing flat ones do.

Agent-Aware Dropout DQN for Safe and Efficient On-line Dialogue Policy Learning

A novel agent-aware dropout Deep Q-Network (AAD-DQN) is proposed to address the problem of when to consult the teacher and how to learn from the teacher’s experiences and can significantly improve both safety and efficiency of on-line policy optimization compared to other companion learning approaches.

Strategic Dialogue Management via Deep Reinforcement Learning

A successful application of Deep Reinforcement Learning with a high-dimensional state space to the strategic board game of Settlers of Catan is described, which supports the claim that DRL is a promising framework for training dialogue systems, and strategic agents with negotiation abilities.

Sample-efficient Actor-Critic Reinforcement Learning with Supervised Data for Dialogue Management

A practical approach to learn deep RL-based dialogue policies and demonstrate their effectiveness in a task-oriented information seeking domain is demonstrated.

Sample Efficient On-Line Learning of Optimal Dialogue Policies with Kalman Temporal Differences

In this contribution, a sample-efficient, online and off-policy reinforcement learning algorithm is proposed to learn an optimal policy from few hundreds of dialogues generated with a very simple handcrafted policy.