Promoting Graph Awareness in Linearized Graph-to-Text Generation

@article{Hoyle2020PromotingGA,
  title={Promoting Graph Awareness in Linearized Graph-to-Text Generation},
  author={Alexander Miserlis Hoyle and Ana Marasovi{\'c} and Noah A. Smith},
  journal={ArXiv},
  year={2020},
  volume={abs/2012.15793}
}
Generating text from structured inputs, such as meaning representations or RDF triples, has often involved the use of specialized graphencoding neural networks. However, recent applications of pretrained transformers to linearizations of graph inputs have yielded stateof-the-art generation results on graph-to-text tasks. Here, we explore the ability of these linearized models to encode local graph structures, in particular their invariance to the graph linearization strategy and their ability… 

Figures and Tables from this paper

Structural Adapters in Pretrained Language Models for AMR-to-Text Generation

The benefits of explicitly encoding graph structure into PLMs using StructAdapt are empirically shown, outperforming the state of the art on two AMR-to-text datasets, training only 5.1% of the PLM parameters.

Syntax Controlled Knowledge Graph-to-Text Generation with Order and Semantic Consistency

This paper optimize the knowledge description order prediction under the order supervision extracted from the caption and further enhance the consistency of the generated sentences and KG through syntactic and semantic regularization.

GraDA: Graph Generative Data Augmentation for Commonsense Reasoning

GraDA is presented, a graph-generative data augmentation framework to synthesize factual data samples from knowledge graphs for commonsense reasoning datasets and shows improvement in robustness to semantic adversaries after training with GraDA and provides human evaluation of the quality of synthetic datasets in terms of factuality and answerability.

Curriculum-Based Self-Training Makes Better Few-Shot Learners for Data-to-Text Generation

This work proposes a novel method called Curriculum-Based Self-Training (CBST), which can outperform fine-tuning and task-adaptive pre-training methods, and achieve state-of-the-art performance in the few-shot setting of data-to-text generation.

Rewarding Semantic Similarity under Optimized Alignments for AMR-to-Text Generation

This work proposes metrics that replace the greedy alignments in BERTScore with optimized ones that compute on a model’s trained token embeddings to prevent domain mismatch and finds that this approach enjoys stable training compared to a non-RL setting.

Open Domain Question Answering with A Unified Knowledge Interface

This work proposes a verbalizer-retriever-reader framework for ODQA over data and text where verbalized tables from Wikipedia and graphs from Wikidata are used as augmented knowledge sources and shows that the Unified Data and Text QA, UDT-QA, can effectively benefit from the expanded knowledge index, leading to large gains over text-only baselines.

Open Domain Question Answering over Virtual Documents: A Unified Approach for Data and Text

This work uses the data-to-text method as a means for coding structured knowledge for knowledge- intensive applications, i.e. open-domain answering (ODQA), and poses a verbalizer-retriever-reader framework for ODQA over data and text where verbalized tables from Wikipedia and graphs from Wikidata are used as augmented knowledge sources.

Revisiting Generative Commonsense Reasoning: A Pre-Ordering Approach

It is argued that PTM’s inherent ability for generative commonsense reasoning is underestimated due to the order-agnostic property of its input, and proposed a pre-ordering approach to elaborately manipulate the order of the given concepts before generation.

Generating Textual Explanations for Machine Learning Models Performance: A Table-to-Text Task

Evaluation and analysis conducted indicate, that exploring pre-trained models for data-to-text generation leads to better generalisation performance and can produce high-quality textual explanations.

GraDA: Graph Generative Data Augmentation for Commonsense Reasoning

GraDA is presented, a graph-generative data augmentation framework to synthesize factual data samples from knowledge graphs for commonsense reasoning datasets and shows improvement in robustness to semantic adversaries after training with GraDA and provides human evaluation of the quality of synthetic datasets in terms of factuality and answerability.

References

SHOWING 1-10 OF 47 REFERENCES

Enhancing AMR-to-Text Generation with Dual Graph Representations

A novel graph-to-sequence model that encodes different but complementary perspectives of the structural information contained in the AMR graph, learning parallel top-down and bottom-up representations of nodes capturing contrasting views of the graph.

Modeling Graph Structure in Transformer for Better AMR-to-Text Generation

This paper proposes a novel structure-aware self-attention approach to better model the relations between indirectly connected concepts in the state-of-the-art seq2seq model, i.e. the Transformer.

Bridging the Structural Gap Between Encoding and Decoding for Data-To-Text Generation

This work proposes DualEnc, a dual encoding model that can not only incorporate the graph structure, but can also cater to the linear structure of the output text, demonstrating that dual encoding can significantly improve the quality of the generated text.

Investigating Pretrained Language Models for Graph-to-Text Generation

It is suggested that the PLMs benefit from similar facts seen during pretraining or fine-tuning, such that they perform well even when the input graph is reduced to a simple bag of node and edge labels.

Generation from Abstract Meaning Representation using Tree Transducers

This paper addresses generating English from the Abstract Meaning Representation (AMR), consisting of re-entrant graphs whose nodes are concepts and edges are relations, and consists of generating an appropriate spanning tree for the AMR and applying tree-tostring transducers to generate English.

Controllable Meaning Representation to Text Generation: Linearization and Data Augmentation Strategies

It is found that properly aligning input sequences during training leads to highly controllable generation, both when training from scratch or when fine-tuning a larger pre-trained model.

GPT-too: A Language-Model-First Approach for AMR-to-Text Generation

An alternative approach that combines a strong pre-trained language model with cycle consistency-based re-scoring is proposed that outperform all previous techniques on the English LDC2017T10 dataset, including the recent use of transformer architectures.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

ToTTo: A Controlled Table-To-Text Generation Dataset

We present ToTTo, an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table

Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity

This work presents DataTuner, a neural, end-to-end data- to-text generation system that makes minimal assumptions about the data representation and target domain, combining a fine-tuned language model with a semantic fidelity classifier.