CHAI: A CHatbot AI for Task-Oriented Dialogue with Offline Reinforcement Learning

  title={CHAI: A CHatbot AI for Task-Oriented Dialogue with Offline Reinforcement Learning},
  author={Siddharth Verma and Justin Fu and Mengjiao Yang and Sergey Levine},
Conventionally, generation of natural language for dialogue agents may be viewed as a statistical learning problem: determine the patterns in human-provided data and generate appropriate responses with similar statistical properties. However, dialogue can also be regarded as a goal directed process, where speakers attempt to accomplish a specific task. Reinforcement learning (RL) algorithms are designed specifically for solving such goal-directed problems, but the most direct way to apply RL… 

Figures and Tables from this paper

Dialogue Evaluation with Offline Reinforcement Learning

This paper shows that offline RL critics can be trained for any dialogue system as external evaluators, allowing dialogue performance comparisons across various types of systems, and has the benefit of being corpus- and model-independent, while attaining strong correlation with human judgements.

A Mixture-of-Expert Approach to RL-based Dialogue Management

A novel mixture of expert language model (MoE-LM) that consists of a LM capable of learning diverse semantics for conversation histories, a number of specialized LMs capable of generating utterances corresponding to a particular attribute or personality, and a RL-based DM that performs dialogue planning with the utterances generated by the experts is developed.

Offline RL for Natural Language Generation with Implicit Language Q Learning

This work proposes a novel offline RL motivated method, implicit language Q-learning (ILQL), designed for use on language models, that combines both the flexible utility optimization framework of traditional RL algorithms with supervised learning’s ability to leverage existing data and its simplicity and stability.



Reinforcement Learning for Spoken Dialogue Systems: Comparing Strengths and Weaknesses for Practical Deployment

To what extent the action selection process can be automated by current state-of-the-art reinforcement learning methods for dialogue management is assessed.

On-line policy optimisation of spoken dialogue systems via live interaction with human subjects

An experiment to learn a policy for a real-world task directly from human interaction using rewards provided by users shows that a usable policy can be learnt in just a few hundred dialogues without needing a user simulator and using a learning strategy that reduces the risk of taking bad actions.

Sample-efficient batch reinforcement learning for dialogue management optimization

Experimental results show that a set of approximate dynamic programming algorithms combined to a method for learning a sparse representation of the value function can learn good dialogue policies directly from data, avoiding user modeling errors.

Deep Reinforcement Learning for Dialogue Generation

This work simulates dialogues between two virtual agents, using policy gradient methods to reward sequences that display three useful conversational properties: informativity, non-repetitive turns, coherence, and ease of answering.

Reinforcement Learning for Spoken Dialogue Systems

A general software tool (RLDS, for Reinforcement Learning for Dialogue Systems) based on the MDP framework is built and applied to dialogue corpora gathered from two dialogue systems built at AT&T Labs, demonstrating that RLDS holds promise as a tool for "browsing" and understanding correlations in complex, temporally dependent dialogue Corpora.

Modelling Hierarchical Structure between Dialogue Policy and Natural Language Generator with Option Framework for Task-oriented Dialogue System

This work proposes modelling the hierarchical structure between dialogue policy and natural language generator (NLG) with the option framework, called HDNO, where the latent dialogue act is applied to avoid designing specific dialogue act representations and demonstrates the semantic meanings of latent dialogue acts to show the ability of explanation.

Hierarchical Text Generation and Planning for Strategic Dialogue

An approach to learning representations of messages in dialogues by maximizing the likelihood of subsequent sentences and actions, which decouples the semantics of the dialogue utterance from its linguistic realization, which outperforms previous work both linguistically and strategically.

Policy Networks with Two-Stage Training for Dialogue Systems

This paper shows that, on summary state and action spaces, deep Reinforcement Learning (RL) outperforms Gaussian Processes methods and shows that a deep RL method based on an actor-critic architecture can exploit a small amount of data very efficiently.

Continuously Learning Neural Dialogue Management

A unified neural network framework is proposed to enable the system to first learn by supervision from a set of dialogue data and then continuously improve its behaviour via reinforcement learning, all using gradient-based algorithms on one single model.

Representing the Reinforcement Learning state in a negotiation dialogue

  • P. Heeman
  • Computer Science
    2009 IEEE Workshop on Automatic Speech Recognition & Understanding
  • 2009
This paper explores a task that requires negotiation, in which conversants need to exchange information in order to decide on a good solution, and investigates what information should be included in the system's RL state so that an optimal policy can be learned and so that the state space stays reasonable in size.