Corpus ID: 57825721

On the Turing Completeness of Modern Neural Network Architectures

@article{Prez2019OnTT,
  title={On the Turing Completeness of Modern Neural Network Architectures},
  author={Jorge P{\'e}rez and Javier Marinkovic and P. Barcel{\'o}},
  journal={ArXiv},
  year={2019},
  volume={abs/1901.03429}
}
Alternatives to recurrent neural networks, in particular, architectures based on attention or convolutions, have been gaining momentum for processing input sequences. In spite of their relevance, the computational properties of these alternatives have not yet been fully explored. We study the computational power of two of the most paradigmatic architectures exemplifying these mechanisms: the Transformer (Vaswani et al., 2017) and the Neural GPU (Kaiser & Sutskever, 2016). We show both models to… Expand
Attention is Turing Complete
Alternatives to recurrent neural networks, in particular, architectures based on self-attention, are gaining momentum for processing input sequences. In spite of their relevance, the computationalExpand
WHAT GRAPH NEURAL NETWORKS CANNOT LEARN: DEPTH
This paper studies the expressive power of graph neural networks falling within the message-passing framework (GNNmp). Two results are presented. First, GNNmp are shown to be Turing universal underExpand
On the Practical Ability of Recurrent Neural Networks to Recognize Hierarchical Languages
TLDR
This work studies the performance of recurrent models on Dyck-n languages, a particularly important and well-studied class of CFLs, and finds that while recurrent models generalize nearly perfectly if the lengths of the training and test strings are from the same range, they perform poorly if the teststrings are longer. Expand
Theoretical Limitations of Self-Attention in Neural Sequence Models
TLDR
Across both soft and hard attention, strong theoretical limitations are shown of the computational abilities of self-attention, finding that it cannot model periodic finite-state languages, nor hierarchical structure, unless the number of layers or heads increases with input length. Expand
How hard is graph isomorphism for graph neural networks?
TLDR
This study derives the first hardness results for graph isomorphism in the message-passing model (MPNN), which encompasses the majority of graph neural networks used today and is universal in the limit when nodes are given unique features. Expand
Neural networks learn to detect and emulate sorting algorithms from images of their execution traces
TLDR
It is demonstrated that simple algorithms can be modelled using neural networks and provide a method for representing specific classes of programs as either images or sequences of instructions in a domain-specific language, such that a neural network can learn their behavior. Expand
What graph neural networks cannot learn: depth vs width
TLDR
GNNmp are shown to be Turing universal under sufficient conditions on their depth, width, node attributes, and layer expressiveness, and it is discovered that GNNmp can lose a significant portion of their power when their depth and width is restricted. Expand
Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers
TLDR
This work proposes a formal definition of statistically meaningful approximation which requires the approximating network to exhibit good statistical learnability, and introduces new tools for generalization bounds that provide much tighter sample complexity guarantees than the typical VC-dimension or norm-based bounds. Expand
On the Ability of Self-Attention Networks to Recognize Counter Languages
TLDR
This work systematically study the ability of Transformers to model such languages as well as the role of its individual components in doing so and the influence of positional encoding schemes on the learning and generalization ability of the model. Expand
Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth
TLDR
This work proposes a new way to understand self-attention networks: it is shown that their output can be decomposed into a sum of smaller terms—or paths—each involving the operation of a sequence of attention heads across layers, and proves that selfattention possesses a strong inductive bias towards “token uniformity”. Expand
...
1
2
3
4
...

References

SHOWING 1-10 OF 22 REFERENCES
Neural GPUs Learn Algorithms
TLDR
It is shown that the Neural GPU can be trained on short instances of an algorithmic task and successfully generalize to long instances, and a technique for training deep recurrent networks: parameter sharing relaxation is introduced. Expand
Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets
TLDR
The limitations of standard deep learning approaches are discussed and it is shown that some of these limitations can be overcome by learning how to grow the complexity of a model in a structured way. Expand
On the Practical Computational Power of Finite Precision RNNs for Language Recognition
TLDR
It is shown that the LSTM and the Elman-RNN with ReLU activation are strictly stronger than the RNN with a squashing activation and the GRU. Expand
Learning to Transduce with Unbounded Memory
TLDR
This paper proposes new memory-based recurrent networks that implement continuously differentiable analogues of traditional data structures such as Stacks, Queues, and DeQues and shows that these architectures exhibit superior generalisation performance to Deep RNNs and are often able to learn the underlying generating algorithms in the transduction experiments. Expand
On the Computational Power of Neural Nets
TLDR
It is proved that one may simulate all Turing machines by such nets, and any multi-stack Turing machine in real time, and there is a net made up of 886 processors which computes a universal partial-recursive function. Expand
Universal Transformers
TLDR
The Universal Transformer (UT), a parallel-in-time self-attentive recurrent sequence model which can be cast as a generalization of the Transformer model and which addresses issues of parallelizability and global receptive field, is proposed. Expand
Extensions and Limitations of the Neural GPU
TLDR
It is found that Neural GPUs that correctly generalize to arbitrarily long numbers still fail to compute the correct answer on highly-symmetric, atypical inputs: for example, a Neural GPU that achieves near-perfect generalization on decimal multiplication of up to 100-digit long numbers can fail on $\dots002$. Expand
Neural Turing Machines
TLDR
A combined system is analogous to a Turing Machine or Von Neumann architecture but is differentiable end-toend, allowing it to be efficiently trained with gradient descent. Expand
Recurrent Neural Networks as Weighted Language Recognizers
TLDR
It is shown that approximations and heuristic algorithms are necessary in practical applications of single-layer, ReLU-activation, rational-weight RNNs with softmax, which are commonly used in natural language processing applications. Expand
Attention is All you Need
TLDR
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. Expand
...
1
2
3
...