• Publications
  • Influence
Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models
The recently proposed hierarchical recurrent encoder-decoder neural network is extended to the dialogue domain, and it is demonstrated that this model is competitive with state-of-the-art neural language models and back-off n-gram models. Expand
The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems
This paper introduces the Ubuntu Dialogue Corpus, a dataset containing almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words. This provides a uniqueExpand
Point-based value iteration: An anytime algorithm for POMDPs
This paper introduces the Point-Based Value Iteration (PBVI) algorithm for POMDP planning, and presents results on a robotic laser tag problem as well as three test domains from the literature. Expand
A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues
A neural network-based generative architecture, with latent stochastic variables that span a variable number of time steps, that improves upon recently proposed models and that the latent variables facilitate the generation of long outputs and maintain the context. Expand
How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation
This work investigates evaluation metrics for dialogue response generation systems where supervised labels, such as task completion, are not available and shows that these metrics correlate very weakly with human judgements in the non-technical Twitter domain, and not at all in the technical Ubuntu domain. Expand
Deep Reinforcement Learning that Matters
Challenges posed by reproducibility, proper experimental techniques, and reporting procedures are investigated and guidelines to make future results in deep RL more reproducible are suggested. Expand
Online Planning Algorithms for POMDPs
The objectives here are to survey the various existing online POMDP methods, analyze their properties and discuss their advantages and disadvantages; and to thoroughly evaluate these online approaches in different environments under various metrics. Expand
An Actor-Critic Algorithm for Sequence Prediction
An approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL) that condition the critic network on the ground-truth output, and shows that this method leads to improved performance on both a synthetic task, and for German-English machine translation. Expand
Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses
An evaluation model (ADEM) that learns to predict human-like scores to input responses, using a new dataset of human response scores and it is shown that the ADEM model's predictions correlate significantly, and at a level much higher than word-overlap metrics such as BLEU, with human judgements at both the utterance and system-level. Expand
Anytime Point-Based Approximations for Large POMDPs
The point selection procedure is combined with point-based value backups to form an effective anytime POMDP algorithm called Point-Based Value Iteration (PBVI), and a theoretical analysis justifying the choice of belief selection technique is presented. Expand