• Publications
  • Influence
Better Automatic Evaluation of Open-Domain Dialogue Systems with Contextualized Embeddings
TLDR
Using contextualized word embeddings to compute more accurate relatedness scores and thus better evaluation metrics is explored, and experiments show that the evaluation metrics outperform RUBER, which is trained on staticembeddings.
Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems
TLDR
It is demonstrated that human annotators have high agreement on assessing utterance-level engagement scores and that these scores can improve automatic evaluation metrics for open-domain dialogue systems, as shown by correlation with human judgements.
ParsiNLU: A Suite of Language Understanding Challenges for Persian
TLDR
This work introduces ParsiNLU, the first benchmark in Persian language that includes a range of language understanding tasks—reading comprehension, textual entailment, and so on, and presents the first results on state-of-the-art monolingual and multilingual pre-trained language models on this benchmark and compares them with human performance.
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
TLDR
Evaluation of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters finds that model performance and calibration both improve with scale, but are poor in absolute terms.
DiSCoL: Toward Engaging Dialogue Systems through Conversational Line Guided Response Generation
TLDR
DiSCoL is an open-domain dialogue system that leverages conversational lines (briefly convlines) as controllable and informative content-planning elements to guide the generation model produce engaging and informative responses.
Plot-guided Adversarial Example Construction for Evaluating Open-domain Story Generation
TLDR
Experiments show that the evaluation metrics trained on the generated data result in more reliable automatic assessments that correlate remarkably better with human judgments compared to the baselines.
Improving Sparsity Problem in Group Recommendation
TLDR
By enhancing basic memorybased techniques, this paper resolves the data sparsity problem for users in the group and has shown that by conducting techniques for the users inThe group the authors have a higher group satisfaction and lower group dissatisfaction.
Modeling Psychotherapy Dialogues with Kernelized Hashcode Representations: A Nonparametric Information-Theoretic Approach.
We propose a novel dialogue modeling framework, the first-ever nonparametric kernel functions based approach for dialogue modeling, which learns kernelized hashcodes as compressed text
User Response and Sentiment Prediction for Automatic Dialogue Evaluation
TLDR
This work proposes to use the sentiment of the next user utterance for turn or dialog level evaluation, and proposes three methods: one that predicts the next sentiment directly, and two others that predict the next users utterance using an utterance or a feedback generator model and then classify its sentiment.
...
...