• Publications
  • Influence
Why We Need New Evaluation Metrics for NLG
The majority of NLG evaluation relies on automatic metrics, such as BLEU. In this paper, we motivate the need for novel, systemand data-independent automatic evaluation methods: We investigate a wideExpand
  • 130
  • 21
Activity-Based Restorative Therapies after Spinal Cord Injury: Inter-institutional conceptions and perceptions.
This manuscript is a review of the theoretical and clinical concepts provided during an inter-institutional training program on Activity-Based Restorative Therapies (ABRT) and the perceptions ofExpand
  • 10
  • 3
#MeToo Alexa: How Conversational Systems Respond to Sexual Harassment
Conversational AI systems are rapidly developing from purely transactional systems to social chatbots, which can respond to a wide variety of user requests. In this article, we establish how currentExpand
  • 19
  • 2
Alana: Social Dialogue using an Ensemble Model and a Ranker trained on User Feedback
We describe our Alexa prize system (called ‘Alana’) which consists of an ensemble of bots, combining rule-based and machine learning systems, and using a contextual ranking mechanism to choose systemExpand
  • 23
  • 1
Alana v2: Entertaining and Informative Open-domain Social Dialogue using Ontologies and Entity Linking
We describe our 2018 Alexa prize system (called ‘Alana’) which consists of an ensemble of bots, combining rule-based and machine learning systems. This paper reports on the version of the systemExpand
  • 11
  • 1
An Ensemble Model with Ranking for Social Dialogue
Open-domain social dialogue is one of the long-standing goals of Artificial Intelligence. This year, the Amazon Alexa Prize challenge was announced for the first time, where real customers get toExpand
  • 7
A review of evaluation techniques for social dialogue systems
In contrast with goal-oriented dialogue, social dialogue has no clear measure of task success. Consequently, evaluation of these systems is notoriously hard. In this paper, we review currentExpand
  • 9
A Game-Based Setup for Data Collection and Task-Based Evaluation of Uncertain Information Presentation
Decision-making is often dependent on uncertain data, e.g. data associated with confidence scores, such as probabilities. A concrete example of such data is weather data. We will demo a game-basedExpand
  • 5
Generating and Evaluating Landmark-Based Navigation Instructions in Virtual Environments
Referring to landmarks has been identified to lead to improved navigation instructions. However, a previous corpus study suggests that human “wizards” also choose to refer to street names andExpand
  • 4
A Crowd-based Evaluation of Abuse Response Strategies in Conversational Agents
How should conversational agents respond to verbal abuse through the user? To answer this question, we conduct a large-scale crowd-sourced evaluation of abuse response strategies employed by currentExpand
  • 2