# On the Turing Completeness of Modern Neural Network Architectures

@article{Prez2019OnTT, title={On the Turing Completeness of Modern Neural Network Architectures}, author={Jorge P{\'e}rez and Javier Marinkovic and P. Barcel{\'o}}, journal={ArXiv}, year={2019}, volume={abs/1901.03429} }

Alternatives to recurrent neural networks, in particular, architectures based on attention or convolutions, have been gaining momentum for processing input sequences. In spite of their relevance, the computational properties of these alternatives have not yet been fully explored. We study the computational power of two of the most paradigmatic architectures exemplifying these mechanisms: the Transformer (Vaswani et al., 2017) and the Neural GPU (Kaiser & Sutskever, 2016). We show both models to… Expand

#### Figures and Topics from this paper

#### 35 Citations

Attention is Turing Complete

- 2021

Alternatives to recurrent neural networks, in particular, architectures based on self-attention, are gaining momentum for processing input sequences. In spite of their relevance, the computational… Expand

WHAT GRAPH NEURAL NETWORKS CANNOT LEARN: DEPTH

- 2019

This paper studies the expressive power of graph neural networks falling within the message-passing framework (GNNmp). Two results are presented. First, GNNmp are shown to be Turing universal under… Expand

On the Practical Ability of Recurrent Neural Networks to Recognize Hierarchical Languages

- Computer Science
- COLING
- 2020

This work studies the performance of recurrent models on Dyck-n languages, a particularly important and well-studied class of CFLs, and finds that while recurrent models generalize nearly perfectly if the lengths of the training and test strings are from the same range, they perform poorly if the teststrings are longer. Expand

Theoretical Limitations of Self-Attention in Neural Sequence Models

- Computer Science
- TACL
- 2020

Across both soft and hard attention, strong theoretical limitations are shown of the computational abilities of self-attention, finding that it cannot model periodic finite-state languages, nor hierarchical structure, unless the number of layers or heads increases with input length. Expand

How hard is graph isomorphism for graph neural networks?

- Computer Science, Mathematics
- ArXiv
- 2020

This study derives the first hardness results for graph isomorphism in the message-passing model (MPNN), which encompasses the majority of graph neural networks used today and is universal in the limit when nodes are given unique features. Expand

Neural networks learn to detect and emulate sorting algorithms from images of their execution traces

- Computer Science
- Inf. Softw. Technol.
- 2020

It is demonstrated that simple algorithms can be modelled using neural networks and provide a method for representing specific classes of programs as either images or sequences of instructions in a domain-specific language, such that a neural network can learn their behavior. Expand

What graph neural networks cannot learn: depth vs width

- Computer Science, Mathematics
- ICLR
- 2020

GNNmp are shown to be Turing universal under sufficient conditions on their depth, width, node attributes, and layer expressiveness, and it is discovered that GNNmp can lose a significant portion of their power when their depth and width is restricted. Expand

Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers

- Computer Science, Mathematics
- ArXiv
- 2021

This work proposes a formal definition of statistically meaningful approximation which requires the approximating network to exhibit good statistical learnability, and introduces new tools for generalization bounds that provide much tighter sample complexity guarantees than the typical VC-dimension or norm-based bounds. Expand

On the Ability of Self-Attention Networks to Recognize Counter Languages

- Computer Science
- EMNLP
- 2020

This work systematically study the ability of Transformers to model such languages as well as the role of its individual components in doing so and the influence of positional encoding schemes on the learning and generalization ability of the model. Expand

Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth

- Computer Science
- ICML
- 2021

This work proposes a new way to understand self-attention networks: it is shown that their output can be decomposed into a sum of smaller terms—or paths—each involving the operation of a sequence of attention heads across layers, and proves that selfattention possesses a strong inductive bias towards “token uniformity”. Expand

#### References

SHOWING 1-10 OF 22 REFERENCES

Neural GPUs Learn Algorithms

- Computer Science, Mathematics
- ICLR
- 2016

It is shown that the Neural GPU can be trained on short instances of an algorithmic task and successfully generalize to long instances, and a technique for training deep recurrent networks: parameter sharing relaxation is introduced. Expand

Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets

- Computer Science
- NIPS
- 2015

The limitations of standard deep learning approaches are discussed and it is shown that some of these limitations can be overcome by learning how to grow the complexity of a model in a structured way. Expand

On the Practical Computational Power of Finite Precision RNNs for Language Recognition

- Computer Science, Mathematics
- ACL
- 2018

It is shown that the LSTM and the Elman-RNN with ReLU activation are strictly stronger than the RNN with a squashing activation and the GRU. Expand

Learning to Transduce with Unbounded Memory

- Computer Science
- NIPS
- 2015

This paper proposes new memory-based recurrent networks that implement continuously differentiable analogues of traditional data structures such as Stacks, Queues, and DeQues and shows that these architectures exhibit superior generalisation performance to Deep RNNs and are often able to learn the underlying generating algorithms in the transduction experiments. Expand

On the Computational Power of Neural Nets

- Computer Science
- J. Comput. Syst. Sci.
- 1995

It is proved that one may simulate all Turing machines by such nets, and any multi-stack Turing machine in real time, and there is a net made up of 886 processors which computes a universal partial-recursive function. Expand

Universal Transformers

- Computer Science, Mathematics
- ICLR
- 2019

The Universal Transformer (UT), a parallel-in-time self-attentive recurrent sequence model which can be cast as a generalization of the Transformer model and which addresses issues of parallelizability and global receptive field, is proposed. Expand

Extensions and Limitations of the Neural GPU

- Computer Science
- ArXiv
- 2016

It is found that Neural GPUs that correctly generalize to arbitrarily long numbers still fail to compute the correct answer on highly-symmetric, atypical inputs: for example, a Neural GPU that achieves near-perfect generalization on decimal multiplication of up to 100-digit long numbers can fail on $\dots002$. Expand

Neural Turing Machines

- Computer Science
- ArXiv
- 2014

A combined system is analogous to a Turing Machine or Von Neumann architecture but is differentiable end-toend, allowing it to be efficiently trained with gradient descent. Expand

Recurrent Neural Networks as Weighted Language Recognizers

- Computer Science, Mathematics
- NAACL
- 2018

It is shown that approximations and heuristic algorithms are necessary in practical applications of single-layer, ReLU-activation, rational-weight RNNs with softmax, which are commonly used in natural language processing applications. Expand

Attention is All you Need

- Computer Science
- NIPS
- 2017

A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. Expand