How “open” are the conversations with open-domain chatbots? A proposal for Speech Event based evaluation

@article{Doruz2022HowA,
  title={How “open” are the conversations with open-domain chatbots? A proposal for Speech Event based evaluation},
  author={A. Seza Doğru{\"o}z and Gabriel Skantze},
  journal={ArXiv},
  year={2022},
  volume={abs/2211.13560}
}
Open-domain chatbots are supposed to converse freely with humans without being restricted to a topic, task or domain. However, the boundaries and/or contents of open-domain conversations are not clear. To clarify the boundaries of “openness”, we conduct two studies: First, we classify the types of “speech events” encountered in a chatbot evaluation data set (i.e., Meena by Google) and find that these conversations mainly cover the “small talk” category and exclude the other speech event… 

Tables from this paper

Evaluating How Users Game and Display Conversation with Human-Like Agents

This study explores dialogues with a social chatbot uploaded to an online community, with the aim of understanding how users game human-like agents and display their conversations and suggests a categorization scheme for the analysis.

Towards Human Evaluation of Mutual Understanding in Human-Computer Spontaneous Conversation: An Empirical Study of Word Sense Disambiguation for Naturalistic Social Dialogs in American English

This work proposes Word Sense Disambiguation (WSD) as an essential component of a valid and reliable human evaluation framework, whose long-term goal is to radically improve the usability of dialog systems in real-life human-computer collaboration.

References

SHOWING 1-10 OF 28 REFERENCES

Personalizing Dialogue Agents: I have a dog, do you have pets too?

This work collects data and train models tocondition on their given profile information; and information about the person they are talking to, resulting in improved dialogues, as measured by next utterance prediction.

ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons

A novel procedure involving comparing two full dialogues, where a human judge is asked to pay attention to only one speaker within each, and make a pairwise judgment, resulting in better tests.

Conversational AI: The Science Behind the Alexa Prize

The advances created by the university teams as well as the Alexa Prize team to achieve the common goal of solving the problem of Conversational AI are outlined.

The Second Conversational Intelligence Challenge (ConvAI2)

To improve performance on multi-turn conversations with humans, future systems must go beyond single word metrics like perplexity to measure the performance across sequences of utterances (conversations)—in terms of repetition, consistency and balance of dialogue acts.

ParlAI: A Dialog Research Software Platform

ParlAI (pronounced “par-lay”), an open-source software platform for dialog research implemented in Python, is introduced, to provide a unified framework for sharing, training and testing dialog models; integration of Amazon Mechanical Turk for data collection, human evaluation, and online/reinforcement learning.

SOCIAL DIALOGUE WITH EMBODIED CONVERSATIONAL AGENTS

The functions of social dialogue between people in the context of performing a task is discussed, as well as approaches to modelling such dialogue in embodied conversational agents. A study of an

"How about this weather?" Social Dialogue with Embodied Conversational Agents

The ongoing development of an embodied conversational agent that is capable of multimodal input understanding and output generation and operates in a limited application domain in which both social and taskoriented dialogue are used is described.

Constituting Relationships in Talk A Taxonomy of Speech Events in Social and Personal Relationships

In a series of four studies, a descriptive taxonomy of dyadic speech events in everyday relating was developed and employed to explore the constitutive functions of interpersonal communication.