ToTTo: A Controlled Table-To-Text Generation Dataset

@inproceedings{Parikh2020ToTToAC,
  title={ToTTo: A Controlled Table-To-Text Generation Dataset},
  author={Ankur P. Parikh and Xuezhi Wang and Sebastian Gehrmann and Manaal Faruqui and Bhuwan Dhingra and Diyi Yang and Dipanjan Das},
  booktitle={EMNLP},
  year={2020}
}
We present ToTTo, an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, produce a one-sentence description. To obtain generated targets that are natural but also faithful to the source table, we introduce a dataset construction process where annotators directly revise existing candidate sentences from Wikipedia. We present systematic analyses of our dataset and… Expand
DART: Open-Domain Structured Data Record to Text Generation
On Hallucination and Predictive Uncertainty in Conditional Language Generation
  • Yijun Xiao, W. Wang
  • Computer Science
  • EACL
  • 2021
FeTaQA: Free-form Table Question Answering
Learning to Reason for Text Generation from Scientific Tables
Structural Encoding and Pre-training Matter: Adapting BERT for Table-Based Fact Verification
  • Rui Dong, David A. Smith
  • Computer Science
  • EACL
  • 2021
Teach Me to Explain: A Review of Datasets for Explainable NLP
AgreeSum: Agreement-Oriented Multi-Document Summarization
...
1
2
3
4
...

References

SHOWING 1-10 OF 51 REFERENCES
Challenges in Data-to-Document Generation
The E2E Dataset: New Challenges For End-to-End Generation
Neural Text Generation from Structured Data with Application to the Biography Domain
Table-to-text Generation by Structure-aware Seq2seq Learning
Neural Text Summarization: A Critical Evaluation
...
1
2
3
4
5
...