• Corpus ID: 239998288

Transformers Generalize DeepSets and Can be Extended to Graphs and Hypergraphs

  title={Transformers Generalize DeepSets and Can be Extended to Graphs and Hypergraphs},
  author={Jinwoo Kim and Saeyoon Oh and Seunghoon Hong},
  • Jinwoo Kim, Saeyoon Oh, Seunghoon Hong
  • Published 27 October 2021
  • Computer Science
  • ArXiv
We present a generalization of Transformers to any-order permutation invariant data (sets, graphs, and hypergraphs). We begin by observing that Transformers generalize DeepSets, or first-order (set-input) permutation invariant MLPs. Then, based on recently characterized higher-order invariant MLPs, we extend the concept of self-attention to higher orders and propose higher-order Transformers for order-k data (k = 2 for graphs and k > 2 for hypergraphs). Unfortunately, higher-order Transformers… 

Figures and Tables from this paper


Are Transformers universal approximators of sequence-to-sequence functions?
It is established that Transformer models are universal approximators of continuous permutation equivariant sequence-to-sequence functions with compact support, which is quite surprising given the amount of shared parameters in these models.
Provably Powerful Graph Networks
This paper proposes a simple model that interleaves applications of standard Multilayer-Perceptron (MLP) applied to the feature dimension and matrix multiplication and shows that a reduced 2-order network containing just scaled identity operator, augmented with a single quadratic operation (matrix multiplication) has a provable 3-WL expressive power.
Invariant and Equivariant Graph Networks
This paper provides a characterization of all permutation invariant and equivariant linear layers for (hyper-)graph data, and shows that their dimension, in case of edge-value graph data, is 2 and 15, respectively.
Hyper-SAGNN: a self-attention based graph neural network for hypergraphs
This work develops a new self-attention based graph neural network called Hyper-SAGNN applicable to homogeneous and heterogeneous hypergraphs with variable hyperedge sizes that significantly outperforms the state-of-the-art methods on traditional tasks while also achieving great performance on a new task called outsider identification.
Universal Invariant and Equivariant Graph Neural Networks
The results show that a GNN defined by a single set of parameters can approximate uniformly well a function defined on graphs of varying size.
A Note on Over-Smoothing for Graph Neural Networks
It is shown that when the weight matrix satisfies the conditions determined by the spectrum of augmented normalized Laplacian, the Dirichlet energy of embeddings will converge to zero, resulting in the loss of discriminative power.
Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks
This work presents an attention-based neural network module, the Set Transformer, specifically designed to model interactions among elements in the input set, and reduces the computation time of self-attention from quadratic to linear in the number of Elements in the set.
Random walks on hypergraphs
This work contributes to unraveling the effect of higher-order interactions on diffusive processes in higher- order networks, shedding light on mechanisms at the heart of biased information spreading in complex networked systems.
Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning
It is shown that the graph convolution of the GCN model is actually a special form of Laplacian smoothing, which is the key reason why GCNs work, but it also brings potential concerns of over-smoothing with many convolutional layers.
Deep Learning on Graphs: A Survey
This survey comprehensively review the different types of deep learning methods on graphs by dividing the existing methods into five categories based on their model architectures and training strategies: graph recurrent neural networks, graph convolutional networks,graph autoencoders, graph reinforcement learning, and graph adversarial methods.