WeaSuL: Weakly Supervised Dialogue Policy Learning: Reward Estimation for Multi-turn Dialogue

@inproceedings{Khandelwal2021WeaSuLWS,
  title={WeaSuL: Weakly Supervised Dialogue Policy Learning: Reward Estimation for Multi-turn Dialogue},
  author={Anant Khandelwal},
  booktitle={DIALDOC},
  year={2021}
}
An intelligent dialogue system in a multi-turn setting should not only generate the responses which are of good quality, but it should also generate the responses which can lead to long-term success of the dialogue. Although, the current approaches improved the response quality, but they over-look the training signals present in the dialogue data. We can leverage these signals to generate the weakly supervised training data for learning dialog policy and reward estimator, and make the policy… 

Figures and Tables from this paper

Towards Robust Online Dialogue Response Generation

A hierarchical sampling-based method consist- of both utterance-level sampling and semi- 017 utterance -level sampling, to alleviate the dis- 018 crepancy, which implicitly increases the dia- 019 logue coherence.

concept2code: Deep Reinforcement Learning for Conversational AI

This tutorial cover application of Deep Reinforcement Learning for Conversational AI, which uses best of both Reinforcement learning and Deep Learning for solving problems which cannot be addressed by them individually.

References

SHOWING 1-10 OF 69 REFERENCES

Deep Reinforcement Learning for Dialogue Generation

This work simulates dialogues between two virtual agents, using policy gradient methods to reward sequences that display three useful conversational properties: informativity, non-repetitive turns, coherence, and ease of answering.

Policy Networks with Two-Stage Training for Dialogue Systems

This paper shows that, on summary state and action spaces, deep Reinforcement Learning (RL) outperforms Gaussian Processes methods and shows that a deep RL method based on an actor-critic architecture can exploit a small amount of data very efficiently.

End-to-End Reinforcement Learning of Dialogue Agents for Information Access

This paper proposes KB-InfoBot - a multi-turn dialogue agent which helps users search Knowledge Bases without composing complicated queries by replacing symbolic queries with an induced “soft” posterior distribution over the KB that indicates which entities the user is interested in.

Bootstrapping a Neural Conversational Agent with Dialogue Self-Play, Crowdsourcing and On-Line Reinforcement Learning

This paper discusses the advantages of this approach for industry applications of conversational agents, wherein an agent can be rapidly bootstrapped to deploy in front of users and further optimized via interactive learning from actual users of the system.

Self-Supervised Dialogue Learning

A self-supervised learning task, inconsistent order detection, to explicitly capture the flow of conversation in dialogues, and a joint learning framework where SSN can guide the dialogue systems towards more coherent and relevant dialogue learning through adversarial training.

Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning

This paper addresses the travel planning task by formulating the task in the mathematical framework of options over Markov Decision Processes (MDPs), and proposing a hierarchical deep reinforcement learning approach to learning a dialogue manager that operates at different temporal scales.

Guided Dialogue Policy Learning without Adversarial Learning in the Loop

The proposed decompose the adversarial training into two steps, which achieves a remarkable task success rate using both on-policy and off-policy reinforcement learning methods and has potential to transfer knowledge from existing domains to a new domain.

DialogBERT: Discourse-Aware Response Generation via Learning to Recover and Rank Utterances

Experiments show that this approach remarkably outperforms three baselines, such as BART and DialoGPT, in terms of quantitative evaluation, and the human evaluation suggests that DialogBERT generates more coherent, informative, and human-like responses than the baselines with significant margins.

Learning an Effective Context-Response Matching Model with Self-Supervised Tasks for Retrieval-based Dialogues

This paper proposes learning a context-response matching model with auxiliary self-supervised tasks designed for the dialogue data based on pre-trained language models and jointly train the PLM-based response selection model with these auxiliary tasks in a multi-task manner.

Hierarchical Text Generation and Planning for Strategic Dialogue

An approach to learning representations of messages in dialogues by maximizing the likelihood of subsequent sentences and actions, which decouples the semantics of the dialogue utterance from its linguistic realization, which outperforms previous work both linguistically and strategically.
...