Syntax-informed Question Answering with Heterogeneous Graph Transformer

  title={Syntax-informed Question Answering with Heterogeneous Graph Transformer},
  author={Fangyi Zhu and Lok You Tan and See-Kiong Ng and St{\'e}phane Bressan},
Large neural language models are steadily contributing state-of-the-art performance to question answering and other natural language and information processing tasks. These models are expensive to train. We propose to evaluate whether such pre-trained models can benefit from the addition of explicit linguistics information without requiring retraining from scratch. We present a linguistics-informed question answering approach that extends and fine-tunes a pre-trained transformer-based neural… 
1 Citations

Figures and Tables from this paper

Multigranularity Syntax Guidance with Graph Structure for Machine Reading Comprehension

The experimental results illustrate that the proposed “MgSG” module effectively utilizes the graph structure to learn the internal features of sentences, solve the problem of long-distance semantics, while effectively improving the performance of PrLM in machine reading comprehension.



Syntactic Structure Distillation Pretraining for Bidirectional Encoders

A knowledge distillation strategy for injecting syntactic biases into BERT pretraining, by distilling the syntactically informative predictions of a hierarchical—albeit harder to scale—syntactic language model.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

SG-Net: Syntax-Guided Machine Reading Comprehension

This work uses syntax to guide the text modeling by incorporating explicit syntactic constraints into attention mechanism for better linguistically motivated word representations and shows that the proposed SG-Net design helps achieve substantial performance improvement over strong baselines.

Question Answering by Reasoning Across Documents with Graph Convolutional Networks

A neural model which integrates and reasons relying on information spread within documents and across multiple documents is introduced, which achieves state-of-the-art results on a multi-document question answering dataset, WikiHop.

Deep Biaffine Attention for Neural Dependency Parsing

This paper uses a larger but more thoroughly regularized parser than other recent BiLSTM-based approaches, with biaffine classifiers to predict arcs and labels, and shows which hyperparameter choices had a significant effect on parsing accuracy, allowing it to achieve large gains over other graph-based approach.

Constituency Parsing with a Self-Attentive Encoder

It is demonstrated that replacing an LSTM encoder with a self-attentive architecture can lead to improvements to a state-of-the-art discriminative constituency parser, and it is found that separating positional and content information in the encoder canlead to improved parsing accuracy.

Modeling Relational Data with Graph Convolutional Networks

It is shown that factorization models for link prediction such as DistMult can be significantly improved through the use of an R-GCN encoder model to accumulate evidence over multiple inference steps in the graph, demonstrating a large improvement of 29.8% on FB15k-237 over a decoder-only baseline.

What Does BERT Learn about the Structure of Language?

This work provides novel support for the possibility that BERT networks capture structural information about language by performing a series of experiments to unpack the elements of English language structure learned by BERT.

Fast and Accurate Neural CRF Constituency Parsing

A fast and accurate neural CRF constituency parser to batchify the inside algorithm for loss computation by direct large tensor operations on GPU, and meanwhile avoid the outside algorithm for gradient computation via efficient back-propagation.

Constructing Datasets for Multi-hop Reading Comprehension Across Documents

A novel task to encourage the development of models for text understanding across multiple documents and to investigate the limits of existing methods, in which a model learns to seek and combine evidence — effectively performing multihop, alias multi-step, inference.