BertGCN: Transductive Text Classification by Combining GNN and BERT

@article{Lin2021BertGCNTT,
  title={BertGCN: Transductive Text Classification by Combining GNN and BERT},
  author={Yuxiao Lin and Yuxian Meng and Xiaofei Sun and Qinghong Han and Kun Kuang and Jiwei Li and Fei Wu},
  journal={ArXiv},
  year={2021},
  volume={abs/2105.05727}
}
In this work, we propose BertGCN, a model that combines large scale pretraining and transductive learning for text classification. BertGCN constructs a heterogeneous graph over the dataset and represents documents as nodes using BERT representations. By jointly training the BERT and GCN modules within BertGCN, the proposed model is able to leverage the advantages of both worlds: large-scale pretraining which takes the advantage of the massive amount of raw data and transductive learning which… 

Figures and Tables from this paper

TextRGNN: Residual Graph Neural Networks for Text Classification

Improved GNN structure that introduces residual connection to deepen the convolution network depth and integrates the probabilistic language model into the initialization of graph node embedding, so that the non-graph semantic information of can be better extracted.

InducT-GCN: Inductive Graph Convolutional Networks for Text Classification

This paper introduces a novel inductive graph-based text classification framework, InducT-GCN (InducTive Graph Convolutional Networks for Text classification), which outperformed state-of-the-art methods that are either transductive in nature or pre-trained additional resources.

Graph Convolutional Network based on Multihead Pooling for Short Text Classification

A Multi-head-Pooling-based Graph Convolutional Network (MP-GCN) for semi-supervised short text classification is proposed, and its three architectures, which focus on the node representation learning of 1-order, 1&2-order of isomorphic graphs, and 1- order of heterogeneous graphs, are introduced.

CNN-Trans-Enc: A CNN-Enhanced Transformer-Encoder On Top Of Static BERT representations for Document Classification

This work proposes a CNN-Enhanced Transformer-Encoder model which is trained on top of BERT [ CLS ] representations from all layers, employing Convolutional Neural Networks to generate QKV feature maps inside the Transformer -Encoder, instead of linear projections of the input into the embed- ding space.

Simplified-Boosting Ensemble Convolutional Network for Text Classification

An ensemble convolutional network is proposed by combining GCN and CNN, which catches the global information and CNN extracts local features and achieves better performance than other state-of-the-art methods with less memory.

BHGAttN: A Feature-Enhanced Hierarchical Graph Attention Network for Sentiment Analysis

A Bert-based hierarchical graph attention network model (BHGAttN) based on a large-scale pretrained model andgraph attention network to model the hierarchical relationship of texts and exhibits significant competitive advantages compared with the current state-of-the-art baseline models.

ConTextING: Granting Document-Wise Contextual Embeddings to Graph Neural Networks for Inductive Text Classification

This work proposes a simple yet effective unified model, coined ConTextING, with a joint training mechanism to learn from both document embeddings and contextual word interactions simultaneously, which outperforms pure inductive GNNs and BERT-style models.

GNNer: Reducing Overlapping in Span-based NER Using Graph Neural Networks

This work proposes GNNer, a framework that uses Graph Neural Networks to enrich the span representation to reduce the number of overlapping spans during prediction to maintain competitive metric performance.

Active Learning Strategy for COVID-19 Annotated Dataset

A novel discriminative batch-mode active learning (DS3) is proposed to allow faster and more effective COVID-19 data annotation and the results of significance testing verify the effectiveness of DS3 and its superiority over baseline active learning algorithms.

Large Sequence Representation Learning via Multi-Stage Latent Transformers

LANTERN, a multi-stage transformer architecture for named-entity recognition (NER) designed to operate on indefinitely large text sequences (i.e. > 512 elements), and RADAR, an LSTM classifier operating at character level, which predicts the relevance of a word with respect to the entity-recognition task.

References

SHOWING 1-10 OF 48 REFERENCES

VGCN-BERT: Augmenting BERT with Graph Embedding for Text Classification

This paper proposes VGCN-BERT model which combines the capability of BERT with a Vocabulary Graph Convolutional Network (VGCN), and outperforms BERT and GCN alone, and achieve higher effectiveness than that reported in previous studies.

PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks

A semi-supervised representation learning method for text data, which is called the predictive text embedding (PTE), which is comparable or more effective, much more efficient, and has fewer parameters to tune.

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity is presented.

GILE: A Generalized Input-Label Embedding for Text Classification

This paper proposes a new input-label model that generalizes over previous such models, addresses their limitations, and does not compromise performance on seen labels and outperforms monolingual and multilingual models that do not leverage label semantics and previous joint input- label space models in both scenarios.

Uncertainty-aware Self-training for Text Classification with Few Labels

This work proposes an approach to improve self-training by incorporating uncertainty estimates of the underlying neural network leveraging recent advances in Bayesian deep learning and proposes acquisition functions to select instances from the unlabeled pool leveraging Monte Carlo (MC) Dropout.

Enhancing Generalization in Natural Language Inference by Syntax

This work investigates the use of dependency trees to enhance the generalization of BERT in the NLI task, leveraging on a graph convolutional network to represent a syntax-based matching graph with heterogeneous matching patterns.

Zero-shot Text Classification via Reinforced Self-training

This paper proposes a reinforcement learning framework to learn data selection strategy automatically and provide more reliable selection, and significantly outperforms previous methods in zero-shot text classification.

Joint Embedding of Words and Labels for Text Classification

This work proposes to view text classification as a label-word joint embedding problem: each label is embedded in the same space with the word vectors, and introduces an attention framework that measures the compatibility of embeddings between text sequences and labels.

Graph Attention Networks

We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.