# GraphiT: Encoding Graph Structure in Transformers

@article{Mialon2021GraphiTEG, title={GraphiT: Encoding Graph Structure in Transformers}, author={Gr{\'e}goire Mialon and Dexiong Chen and Margot Selosse and Julien Mairal}, journal={ArXiv}, year={2021}, volume={abs/2106.05667} }

We show that viewing graphs as sets of node features and incorporating structural and positional information into a transformer architecture is able to outperform representations learned with classical graph neural networks (GNNs). Our model, GraphiT, encodes such information by (i) leveraging relative positional encoding strategies in self-attention scores based on positive deﬁnite kernels on graphs, and (ii) enumerating and encoding local sub-structures such as paths of short length. We…

## Figures and Tables from this paper

## 16 Citations

Graph Neural Networks with Learnable Structural and Positional Representations

- Computer ScienceArXiv
- 2021

This work proposes to decouple structural and positional representations to make easy for the network to learn these two essential properties, and introduces a novel generic architecture which is called LSPE (Learnable Structural and Positional Encodings).

Investigating Expressiveness of Transformer in Spectral Domain for Graphs

- Computer ScienceArXiv
- 2022

FeTA is proposed, a framework that aims to perform attention over the entire graph spectrum analogous to the attention in spatial space and provides homogeneous performance gain against vanilla transformer across all tasks on standard benchmarks and can easily be extended to GNN-based models with low-pass characteristics.

Recipe for a General, Powerful, Scalable Graph Transformer

- Computer ScienceArXiv
- 2022

A recipe on how to build a general, powerful, scalable (GPS) graph Transformer with linear complexity and state-of-the-art results on a diverse set of benchmarks is proposed and a modular framework that supports multiple types of encodings and that provides scalability both in small and large graphs is built.

Transformer for Graphs: An Overview from Architecture Perspective

- Computer Science
- 2022

This survey provides a comprehensive review of various Graph Transformer models from the architectural design perspective and confirms the benefits of current graph-specific modules on Transformer and reveal their advantages on different kinds of graph tasks.

Structure-Enhanced Heterogeneous Graph Contrastive Learning

- Computer Science
- 2022

A novel method to generate multiple semantic views for HGs based on metapaths and advocate the explicit use of structure embedding, which enriches the model with local structural patterns of the underlying HGs, so as to better mine true and hard negatives for GCL.

Benchmarking Graph Neural Networks

- Computer ScienceArXiv
- 2020

A reproducible GNN benchmarking framework is introduced, with the facility for researchers to add new models conveniently for arbitrary datasets, and a principled investigation into the recent Weisfeiler-Lehman GNNs (WL-GNNs) compared to message passing-based graph convolutional networks (GCNs).

PACE: A Parallelizable Computation Encoder for Directed Acyclic Graphs

- Computer ScienceICML
- 2022

A Parallelizable Attention-based Computation struc- ture Encoder (PACE) that processes nodes simultaneously and encodes DAGs in parallel is proposed and demonstrated through encoder-dependent optimization subroutines that search the optimal DAG structure based on the learned DAG embeddings.

Deformable Graph Transformer

- Computer ScienceArXiv
- 2022

This paper proposes Deformable Graph Transformer (DGT) that performs sparse attention with dynamically sampled key and value pairs that consistently outperforms existing Transformer-based models and shows competitive performance compared to state-of-the-art models on 8 graph benchmark datasets including large-scale graphs.

Sign and Basis Invariant Networks for Spectral Graph Representation Learning

- Computer ScienceArXiv
- 2022

SignNet and BasisNet are introduced — new neural architectures that are invariant to all requisite symmetries and hence process collections of eigenspaces in a principled manner and can approximate any continuous function of eigenvectors with the proper invariances.

KPGT: Knowledge-Guided Pre-training of Graph Transformer for Molecular Property Prediction

- Computer ScienceKDD
- 2022

Knowledge-guided Pre-training of Graph Transformer (KPGT) is introduced, a novel self-supervised learning framework for molecular graph representation learning that can offer superior performance over current state-of-the-art methods on several molecular property prediction tasks.

## References

SHOWING 1-10 OF 53 REFERENCES

Graph Transformer Networks

- Computer ScienceNeurIPS
- 2019

This paper proposes Graph Transformer Networks (GTNs) that are capable of generating new graph structures, which involve identifying useful connections between unconnected nodes on the original graph, while learning effective node representation on the new graphs in an end-to-end fashion.

Graph Attention Networks

- Computer ScienceICLR
- 2018

We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior…

How Powerful are Graph Neural Networks?

- Computer ScienceICLR
- 2019

This work characterize the discriminative power of popular GNN variants, such as Graph Convolutional Networks and GraphSAGE, and show that they cannot learn to distinguish certain simple graph structures, and develops a simple architecture that is provably the most expressive among the class of GNNs.

Convolutional Kernel Networks for Graph-Structured Data

- Computer ScienceICML
- 2020

This work introduces a family of multilayer graph kernels and establishes new links between graph convolutional neural networks and kernel methods, by representing graphs as a sequence of kernel feature maps, where each node carries information about local graph substructures.

Semi-Supervised Classification with Graph Convolutional Networks

- Computer ScienceICLR
- 2017

A scalable approach for semi-supervised learning on graph-structured data that is based on an efficient variant of convolutional neural networks which operate directly on graphs which outperforms related methods by a significant margin.

Weisfeiler-Lehman Graph Kernels

- Computer Science, MathematicsJ. Mach. Learn. Res.
- 2011

A family of efficient kernels for large graphs with discrete node labels based on the Weisfeiler-Lehman test of isomorphism on graphs that outperform state-of-the-art graph kernels on several graph classification benchmark data sets in terms of accuracy and runtime.

Diffusion Improves Graph Learning

- Computer ScienceNeurIPS
- 2019

This work removes the restriction of using only the direct neighbors by introducing a powerful, yet spatially localized graph convolution: Graph diffusion convolution (GDC), which leverages generalized graph diffusion and alleviates the problem of noisy and often arbitrarily defined edges in real graphs.

Graph kernels based on tree patterns for molecules

- Computer ScienceMachine Learning
- 2008

New kernels with a parameter to control the complexity of the subtrees used as features to represent the graphs are proposed, which allows to smoothly interpolate between classical graph kernels based on the count of common walks and kernels that emphasize the detection of large common subtrees.

Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels

- Computer ScienceNeurIPS
- 2019

A new class of graph kernels, Graph Neural Tangent Kernels (GNTKs), which correspond to infinitely wide multi-layer GNNs trained by gradient descent are presented, which enjoy the full expressive power ofGNNs and inherit advantages of GKs.

A Trainable Optimal Transport Embedding for Feature Aggregation and its Relationship to Attention

- Computer ScienceICLR
- 2021

A parametrized embedding that aggregates the features from a given set according to the optimal transport plan between the set and a trainable reference, which scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.