What Makes Data-to-Text Generation Hard for Pretrained Language Models?
@article{Keymanesh2022WhatMD, title={What Makes Data-to-Text Generation Hard for Pretrained Language Models?}, author={Moniba Keymanesh and Adrian Benton and Mark Dredze}, journal={ArXiv}, year={2022}, volume={abs/2205.11505} }
Expressing natural language descriptions of structured facts or relations – data-to-text generation (D2T) – increases the accessibility of structured knowledge repositories. Previous work (Nan et al., 2020) shows that pre-trained language models (PLMs) perform remarkably well on this task after fine-tuning on a significant amount of task-specific training data. On the other hand, while auto-regressive PLMs can generalize from a few task examples, their efficacy at D2T is largely unexplored. Further…
Figures and Tables from this paper
References
SHOWING 1-10 OF 21 REFERENCES
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- Computer ScienceJ. Mach. Learn. Res.
- 2020
This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.
A Study of Translation Edit Rate with Targeted Human Annotation
- PsychologyAMTA
- 2006
A new, intuitive measure for evaluating machine-translation output that avoids the knowledge intensiveness of more meaning-based approaches, and the labor-intensiveness of human judgments is examined, which indicates that HTER correlates with human judgments better than HMETEOR and that the four-reference variants of TER and HTER correlate withhuman judgments as well as—or better than—a second human judgment does.
DART: Open-Domain Structured Data Record to Text Generation
- Computer ScienceNAACL
- 2021
The dataset construction framework effectively merged heterogeneous sources from open domain semantic parsing and spoken dialogue systems by utilizing techniques including tree ontology annotation, question-answer pair to declarative sentence conversion, and predicate unification, all with minimum post-editing.
Language Models are Unsupervised Multitask Learners
- Computer Science
- 2019
It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.
Multitask Prompted Training Enables Zero-Shot Task Generalization
- Computer ScienceArXiv
- 2021
A system for easily mapping any natural language tasks into a human-readable prompted form and fine-tune a pretrained encoder-decoder model on this multitask mixture covering a wide variety of tasks.
Investigating Pretrained Language Models for Graph-to-Text Generation
- Computer ScienceNLP4CONVAI
- 2021
It is suggested that the PLMs benefit from similar facts seen during pretraining or fine-tuning, such that they perform well even when the input graph is reduced to a simple bag of node and edge labels.
Do Massively Pretrained Language Models Make Better Storytellers?
- Computer ScienceCoNLL
- 2019
It is found that although GPT2-117 conditions more strongly on context, is more sensitive to ordering of events, and uses more unusual words, it is just as likely to produce repetitive and under-diverse text when using likelihood-maximizing decoding algorithms.
OpenDialKG: Explainable Conversational Reasoning with Attention-based Walks over Knowledge Graphs
- Computer ScienceACL
- 2019
This work proposes the DialKG Walker model, a conversational reasoning model that learns the symbolic transitions of dialog contexts as structured traversals over KG, and predicts natural entities to introduce given previous dialog contexts via a novel domain-agnostic, attention-based graph path decoder.
PASS: A Dutch data-to-text system for soccer, targeted towards specific audiences
- Computer ScienceINLG
- 2017
Human-based evaluation shows that people are generally positive towards PASS in regards to its clarity and fluency, and that the tailoring is accurately recognized in most cases.
The E2E Dataset: New Challenges For End-to-End Generation
- Computer ScienceSIGDIAL Conference
- 2017
The E2E dataset poses new challenges: (1) its human reference texts show more lexical richness and syntactic variation, including discourse phenomena; (2) generating from this set requires content selection, which promises more natural, varied and less template-like system utterances.