Corpus ID: 221186730

Do Syntax Trees Help Pre-trained Transformers Extract Information?

@inproceedings{Sachan2021DoST,
  title={Do Syntax Trees Help Pre-trained Transformers Extract Information?},
  author={Devendra Singh Sachan and Yuhao Zhang and Peng Qi and William Hamilton},
  booktitle={EACL},
  year={2021}
}
Much recent work suggests that incorporating syntax information from dependency trees can improve task-specific transformer models. However, the effect of incorporating dependency tree information into pre-trained transformer models (e.g., BERT) remains unclear, especially given recent studies highlighting how these models implicitly encode syntax. In this work, we systematically study the utility of incorporating dependency trees into pre-trained transformers on three representative… Expand
Composing Byte-Pair Encodings for Morphological Sequence Classification
Byte-pair encodings is a method for splitting a word into sub-word tokens, a language model then assigns contextual representations separately to each of these tokens. In this paper, we evaluate fourExpand
Syntax-Enhanced Pre-trained Model
TLDR
First, it is demonstrated that infusing automatically produced syntax of text improves pre-trained models and second, global syntactic distances among tokens bring larger performance gains compared to local head relations between contiguous tokens. Expand
Dependency Parsing with Bottom-up Hierarchical Pointer Networks
TLDR
A bottom-up-oriented Hierarchical Pointer Network for the left-to-right parser is developed and two novel transition-based alternatives are proposed: an approach that parses a sentence in right- to-left order and a variant that does it from the outside in. Expand
Graph Ensemble Learning over Multiple Dependency Trees for Aspect-level Sentiment Classification
TLDR
This work proposes a simple yet effective graph ensemble technique, GraphMerge, to make use of the predictions from different parsers to be robust to parse errors, and helps avoid overparameterization and overfitting from GNN layer stacking by introducing more connectivity into the ensemble graph. Expand
HYDRA - Hyper Dependency Representation Attentions
Attention is all we need as long as we have enough data. Even so, it is sometimes not easy to determine how much data is enough while the models are becoming larger and larger. In this paper, weExpand
On the Use of Parsing for Named Entity Recognition
TLDR
The characteristics of NER are studied, a task that is far from being solved despite its long history; the latest advances in parsing are analyzed; the different approaches to NER that make use of syntactic information are reviewed; and a new way of using parsing in NER is proposed based on casting parsing itself as a sequence labeling task. Expand
Privacy-Preserving Graph Convolutional Networks for Text Classification
TLDR
A simple yet efficient method based on random graph splits that not only improves the baseline privacy bounds by a factor of 2.7 while retaining competitive F1 020 scores, but also provides strong privacy guar021 antees of ε = 1.0. Expand
Semantic Representation for Dialogue Modeling
TLDR
To the knowledge, this work is the first to leverage a formal semantic representation into neural dialogue modeling, and compares with the textual input, AMR explicitly provides core semantic knowledge and reduces data sparsity. Expand
Structural Guidance for Transformer Language Models
TLDR
Converging evidence is suggested that generative structural supervisions can induce more robust and humanlike linguistic generalization in Transformer language models without the need for data intensive pre-training. Expand
Syntax-BERT: Improving Pre-trained Transformers with Syntax Trees
TLDR
This paper proposes a novel framework named Syntax-BERT, which works in a plug-and-play mode and is applicable to an arbitrary pre-trained checkpoint based on Transformer architecture and achieves consistent improvement over multiple pre- trained models, including BERT, RoBERTa, and T5. Expand
...
1
2
...

References

SHOWING 1-10 OF 50 REFERENCES
Graph Convolution over Pruned Dependency Trees Improves Relation Extraction
TLDR
An extension of graph convolutional networks that is tailored for relation extraction, which pools information over arbitrary dependency structures efficiently in parallel is proposed, and a novel pruning strategy is applied to the input trees by keeping words immediately around the shortest path between the two entities among which a relation might hold. Expand
Linguistically-Informed Self-Attention for Semantic Role Labeling
TLDR
LISA is a neural network model that combines multi-head self-attention with multi-task learning across dependency parsing, part-of-speech tagging, predicate detection and SRL, and can incorporate syntax using merely raw tokens as input. Expand
Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling
TLDR
A version of graph convolutional networks (GCNs), a recent class of neural networks operating on graphs, suited to model syntactic dependency graphs, is proposed, observing that GCN layers are complementary to LSTM ones. Expand
Efficient Dependency-Guided Named Entity Recognition
TLDR
This work investigates on how to better utilize the structured information conveyed by dependency trees to improve the performance of NER and shows that certain global structured information of the dependency trees can be exploited when building NER models where such information can provide guided learning and inference. Expand
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks. Expand
Analyzing the Structure of Attention in a Transformer Language Model
TLDR
It is found that attention targets different parts of speech at different layer depths within the model, and that attention aligns with dependency relations most strongly in the middle layers, and the deepest layers of the model capture the most distant relationships. Expand
Deep Biaffine Attention for Neural Dependency Parsing
TLDR
This paper uses a larger but more thoroughly regularized parser than other recent BiLSTM-based approaches, with biaffine classifiers to predict arcs and labels, and shows which hyperparameter choices had a significant effect on parsing accuracy, allowing it to achieve large gains over other graph-based approach. Expand
End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures
TLDR
A novel end-to-end neural model to extract entities and relations between them and compares favorably to the state-of-the-art CNN based model (in F1-score) on nominal relation classification (SemEval-2010 Task 8). Expand
What Does BERT Look at? An Analysis of BERT’s Attention
TLDR
It is shown that certain attention heads correspond well to linguistic notions of syntax and coreference, and an attention-based probing classifier is proposed and used to demonstrate that substantial syntactic information is captured in BERT’s attention. Expand
A Structural Probe for Finding Syntax in Word Representations
TLDR
A structural probe is proposed, which evaluates whether syntax trees are embedded in a linear transformation of a neural network’s word representation space, and shows that such transformations exist for both ELMo and BERT but not in baselines, providing evidence that entire syntax Trees are embedded implicitly in deep models’ vector geometry. Expand
...
1
2
3
4
5
...