Share This Author
The E2E Dataset: New Challenges For End-to-End Generation
The E2E dataset poses new challenges: (1) its human reference texts show more lexical richness and syntactic variation, including discourse phenomena; (2) generating from this set requires content selection, which promises more natural, varied and less template-like system utterances.
Why We Need New Evaluation Metrics for NLG
A wide range of metrics are investigated, including state-of-the-art word-based and novel grammar-based ones, and it is demonstrated that they only weakly reflect human judgements of system outputs as generated by data-driven, end-to-end NLG.
Benchmarking Natural Language Understanding Services for building Conversational Agents
The results show that on Intent classification Watson significantly outperforms the other platforms, namely, Dialogflow, LUIS and Rasa; though these also perform well; and Interestingly, on Entity Type recognition, Watson performs significantly worse due to its low Precision.
Evaluating the State-of-the-Art of End-to-End Natural Language Generation: The E2E NLG Challenge
Findings of the E2E NLG Challenge
This paper summarises the experimental setup and results of the first shared task on end-to-end (E2E) natural language generation (NLG) in spoken dialogue systems, and compares 62 systems submitted by 17 institutions, covering a wide range of approaches, including machine learning architectures.
Semantic Noise Matters for Neural Natural Language Generation
The impact of semantic noise on state-of-the-art NNLG models which implement different semantic control mechanisms is shown and it is found that cleaned data can improve semantic correctness by up to 97%, while maintaining fluency.
Reinforcement Learning for Adaptive Dialogue Systems - A Data-driven Methodology for Dialogue Management and Natural Language Generation
- Verena Rieser, Oliver Lemon
- Computer ScienceTheory and Applications of Natural Language…
- 30 November 2011
A new methodology for developing spoken dialogue systems is described in detail, and methods for learning from the data, for building simulation environments for training and testing systems, and for evaluating the results are explored.
RankME: Reliable Human Ratings for Natural Language Generation
This work presents a novel rank-based magnitude estimation method (RankME), which combines the use of continuous scales and relative assessments, and shows that RankME significantly improves the reliability and consistency of human ratings compared to traditional evaluation methods.
Crowd-sourcing NLG Data: Pictures Elicit Better Data.
It is shown that pictorial MRs result in better NL data being collected than logic-based MRs, and are judged as significantly more natural, more informative, and better phrased, with a significant increase in average quality ratings.
An Arabic Twitter Corpus for Subjectivity and Sentiment Analysis
Issues posed by twitter as a genre are highlighted, such as mixture of language varieties and topic-shifts, in a newly collected data set of 8,868 gold-standard annotated Arabic feeds.