Rethinking Action Spaces for Reinforcement Learning in End-to-end Dialog Agents with Latent Variable Models

@inproceedings{Zhao2019RethinkingAS,
  title={Rethinking Action Spaces for Reinforcement Learning in End-to-end Dialog Agents with Latent Variable Models},
  author={Tiancheng Zhao and Kaige Xie and Maxine Esk{\'e}nazi},
  booktitle={NAACL},
  year={2019}
}
Defining action spaces for conversational agents and optimizing their decision-making process with reinforcement learning is an enduring challenge. [...] Key Method Comprehensive experiments are conducted examining both continuous and discrete action types and two different optimization methods based on stochastic variational inference. Results show that the proposed latent actions achieve superior empirical performance improvement over previous word-level policy gradient methods on both DealOrNoDeal and…Expand
Phrase-Level Action Reinforcement Learning for Neural Dialog Response Generation
TLDR
This paper proposes phrase-level action reinforcement learning (PHRASERL), which allows the model to flexibly alter the sentence structure and content with the sequential action selection, and achieves competitive results with state-of-the-art models on automatic evaluation metrics. Expand
Semi-Supervised Dialogue Policy Learning via Stochastic Reward Estimation
TLDR
This work proposes a novel reward learning approach for semi-supervised policy learning that outperforms competitive policy learning baselines on MultiWOZ, a benchmark multi-domain dataset and learns a dynamics model as the reward function which models dialogue progress based on expert demonstrations. Expand
End-to-End latent-variable task-oriented dialogue system with exact log-likelihood optimization
TLDR
An end-to-end dialogue model based on a hierarchical encoder-decoder, which employed a discrete latent variable to learn underlying dialogue intentions, which argues that the latent discrete variable interprets the intentions that guide machine responses generation. Expand
SUMBT+LaRL: Effective Multi-Domain End-to-End Neural Task-Oriented Dialog System
  • Hwaran Lee, Seokhwan Jo, Hyungjun Kim, Sangkeun Jung, Tae-Yoon Kim
  • Computer Science
  • IEEE Access
  • 2021
The recent advent of neural approaches for developing each dialog component in task-oriented dialog systems has remarkably improved, yet optimizing the overall system performance remains a challenge.Expand
SUMBT+LaRL: End-to-end Neural Task-oriented Dialog System with Reinforcement Learning
TLDR
This paper proposes new success criteria for reinforcement learning to the end-to-end dialog system with reinforcement learning, named SUMBT+LaRL, and provides experimental analysis on a different result aspect depending on the success criteria and evaluation methods. Expand
Generalizable and Explainable Dialogue Generation via Explicit Action Learning
TLDR
This work proposes to learn natural language actions that represent utterances as a span of words that promotes generalization via the compositional structure of language and outperforms latent action baselines on MultiWOZ, a benchmark multi-domain dataset. Expand
Causal-aware Safe Policy Improvement for Task-oriented dialogue
TLDR
A batch RL framework for task oriented dialogue policy learning: causal aware safe policy improvement (CASPI), which gives guarantees on dialogue policy’s performance and also learns to shape rewards according to intentions behind human responses, rather than just mimicking demonstration data. Expand
Conversational Graph Grounded Policy Learning for Open-Domain Conversation Generation
TLDR
This work presents a novel CG grounded policy learning framework that conducts dialog flow planning by graph traversal, which learns to identify a what-vertex and a how- Vertex from the CG at each turn to guide response generation. Expand
Multi-Agent Task-Oriented Dialog Policy Learning with Role-Aware Reward Decomposition
TLDR
This work proposes Multi-Agent Dialog Policy Learning, which regards both the system and the user as the dialog agents and uses the actor-critic framework to facilitate pretraining and improve scalability. Expand
DORA: Toward Policy Optimization for Task-oriented Dialogue System with Efficient Context
TLDR
A multi-domain task-oriented dialogue system that uses SL with subsequently applied RL to optimize dialogue systems using a recurrent dialogue policy that considers an efficient context instead of the entire dialogue history, which improves the success rate. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 49 REFERENCES
Latent Intention Dialogue Models
TLDR
The experimental evaluation of the proposed Latent Intention Dialogue Model shows that the model out-performs published benchmarks for both corpus-based and human evaluation, demonstrating the effectiveness of discrete latent variable models for learning goal-oriented dialogues. Expand
Deep Reinforcement Learning for Dialogue Generation
TLDR
This work simulates dialogues between two virtual agents, using policy gradient methods to reward sequences that display three useful conversational properties: informativity, non-repetitive turns, coherence, and ease of answering. Expand
End-to-End Reinforcement Learning of Dialogue Agents for Information Access
This paper proposes KB-InfoBot -- a multi-turn dialogue agent which helps users search Knowledge Bases (KBs) without composing complicated queries. Such goal-oriented dialogue agents typically needExpand
Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning
TLDR
This work introduces Hybrid Code Networks (HCNs), which combine an RNN with domain-specific knowledge encoded as software and system action templates, and considerably reduce the amount of training data required, while retaining the key benefit of inferring a latent representation of dialog state. Expand
Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning
This paper presents an end-to-end framework for task-oriented dialog systems using a variant of Deep Recurrent Q-Networks (DRQN). The model is able to interface with a relational database and jointlyExpand
Hierarchical Text Generation and Planning for Strategic Dialogue
TLDR
An approach to learning representations of messages in dialogues by maximizing the likelihood of subsequent sentences and actions, which decouples the semantics of the dialogue utterance from its linguistic realization, which outperforms previous work both linguistically and strategically. Expand
End-to-end LSTM-based dialog control optimized with supervised and reinforcement learning
TLDR
The main component of the model is a recurrent neural network (an LSTM), which maps from raw dialog history directly to a distribution over system actions, which relieves the system developer of much of the manual feature engineering of dialog state. Expand
Zero-Shot Dialog Generation with Cross-Domain Latent Actions
TLDR
Experimental results show that the proposed zero-shot dialog generation method is able to achieve superior performance in learning dialog models that can rapidly adapt their behavior to new domains and suggests promising future research. Expand
Sample-efficient Actor-Critic Reinforcement Learning with Supervised Data for Dialogue Management
TLDR
A practical approach to learn deep RL-based dialogue policies and demonstrate their effectiveness in a task-oriented information seeking domain is demonstrated. Expand
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning
TLDR
This work poses a cooperative ‘image guessing’ game between two agents who communicate in natural language dialog so that Q-BOT can select an unseen image from a lineup of images and shows the emergence of grounded language and communication among ‘visual’ dialog agents with no human supervision. Expand
...
1
2
3
4
5
...