Corpus ID: 221266021

How To Evaluate Your Dialogue System: Probe Tasks as an Alternative for Token-level Evaluation Metrics

@article{Parthasarathi2020HowTE,
  title={How To Evaluate Your Dialogue System: Probe Tasks as an Alternative for Token-level Evaluation Metrics},
  author={Prasanna Parthasarathi and Joelle Pineau and Sarath Chandar},
  journal={ArXiv},
  year={2020},
  volume={abs/2008.10427}
}
Though generative dialogue modeling is widely seen as a language modeling task, the task demands an agent to have a complex natural language understanding of its input text to carry a meaningful interaction with an user. The automatic metrics used evaluate the quality of the generated text as a proxy to the holistic interaction of the agent. Such metrics were earlier shown to not correlate with the human judgement. In this work, we observe that human evaluation of dialogue agents can be… Expand
Sometimes We Want Translationese
A Survey of Evaluation Metrics Used for NLG Systems

References

SHOWING 1-10 OF 72 REFERENCES
RUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain Dialog Systems
Wizard of Wikipedia: Knowledge-Powered Conversational agents
Frames: a corpus for adding memory to goal-oriented dialogue systems
CoQA: A Conversational Question Answering Challenge
...
1
2
3
4
5
...