• Corpus ID: 235795251

PonderNet: Learning to Ponder

@article{Banino2021PonderNetLT,
  title={PonderNet: Learning to Ponder},
  author={Andrea Banino and Jan Balaguer and Charles Blundell},
  journal={ArXiv},
  year={2021},
  volume={abs/2107.05407}
}
In standard neural networks the amount of computation used grows with the size of the inputs, but not with the complexity of the problem being learnt. To overcome this limitation we introduce PonderNet, a new algorithm that learns to adapt the amount of computation based on the complexity of the problem at hand. PonderNet learns end-to-end the number of computational steps to achieve an effective compromise between training prediction accuracy, computational cost and generalization. On a… 

Figures and Tables from this paper

End-to-end Algorithm Synthesis with Recurrent Networks: Logical Extrapolation Without Overthinking
TLDR
A recall architecture is proposed that keeps an explicit copy of the problem instance in memory so that it cannot be forgotten and a progressive training routine is employed that prevents the model from learning behaviors that are specific to iteration number and instead pushes it to learn behaviors that can be repeated indefinitely.
The CLRS Algorithmic Reasoning Benchmark
TLDR
This work proposes the CLRS Algorithmic Reasoning Benchmark, covering classical algorithms from the Introduction to Algorithms textbook, and performs extensive experiments to demonstrate how several popular algorithmic reasoning baselines perform on these tasks, and consequently, highlight links to several open challenges.
Transformers discover an elementary calculation system exploiting local attention and grid-like problem representation
TLDR
It is shown that universal transformers equipped with local attention and adaptive halting mechanisms can learn to exploit an external, grid-like memory to carry out multi-digit addition.
Recurrent Vision Transformer for Solving Visual Reasoning Problems
TLDR
The Recurrent Vision Transformer (RViT) model is introduced, which achieves competitive results on the same-different visual reasoning problems from the SVRT dataset, allowing it to learn using far fewer free parameters, using only 28k training samples.
Learning Iterative Reasoning through Energy Minimization
TLDR
This work presents a new framework for iterative reasoning with neural networks, and empirically illustrates that this approach can solve more accurate and generalizable algorithmic reasoning tasks in both graph and continuous domains.
The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization
TLDR
The novel Neural Data Router (NDR) achieves 100% length generalization accuracy on the classic compositional table lookup task, as well as near-perfect accuracy on a simple arithmetic task and a new variant of ListOps testing for generalization across computational depths.
Spike-inspired Rank Coding for Fast and Accurate Recurrent Neural Networks
TLDR
It is shown that temporal coding such as rank coding (RC) inspired by SNNs can be applied to conventional ANNs such as LSTMs, and leads to computational savings and speedups, and in a temporally-encoded MNIST dataset where the model achieves 99.19% accuracy after the first input time-step.
BE3R: BERT based Early-Exit Using Expert Routing
TLDR
This work proposes a novel routing based early exit model called BE3R (BERT based Early-Exit using Expert Routing), where the model learns to dynamically exit in the earlier layers without needing to traverse through the entire model.
AdaViT: Adaptive Tokens for Efficient Vision Transformer
TLDR
A-ViT is introduced, a method that adaptively adjusts the inference cost of vision transformer (ViT) for images of different complexity by automatically reducing the number of tokens in vision transformers that are processed in the network as inference proceeds, which results in immediate, out-of-the-box inference speedup on off- the-shelf computational platform.
Dyna-bAbI: unlocking bAbI’s potential with dynamic synthetic benchmarking
TLDR
Dyna-bAbI is developed, a dynamic framework providing fine-grained control over task generation in bAbI, underscoring the importance of highly controllable task generators for creating robust NLU systems through a virtuous cycle of model and data development.
...
...

References

SHOWING 1-10 OF 17 REFERENCES
Adaptive Computation Time for Recurrent Neural Networks
TLDR
Performance is dramatically improved and insight is provided into the structure of the data, with more computation allocated to harder-to-predict transitions, such as spaces between words and ends of sentences, which suggests that ACT or other adaptive computation methods could provide a generic method for inferring segment boundaries in sequence data.
End-To-End Memory Networks
TLDR
A neural network with a recurrent attention model over a possibly large external memory that is trained end-to-end, and hence requires significantly less supervision during training, making it more generally applicable in realistic settings.
Learning to Skim Text
TLDR
The proposed model is a modified LSTM with jumping, a recurrent network that learns how far to jump after reading a few words of the input text, which is up to 6 times faster than the standard sequential L STM, while maintaining the same or even better accuracy.
Neural Execution of Graph Algorithms
TLDR
It is demonstrated how learning in the space of algorithms can yield new opportunities for positive transfer between tasks, showing how learning a shortest-path algorithm can be substantially improved when simultaneously learning a reachability algorithm.
Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks
TLDR
This work argues for the usefulness of a set of proxy tasks that evaluate reading comprehension via question answering, and classify these tasks into skill sets so that researchers can identify (and then rectify) the failings of their systems.
MEMO: A Deep Network for Flexible Combination of Episodic Memories
TLDR
A novel architecture, MEMO, endowed with the capacity to reason over longer distances is developed with the addition of two novel components, which introduces a separation between memories/facts stored in external memory and the items that comprise these facts in External memory.
Deep Residual Learning for Image Recognition
TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Simple statistical gradient-following algorithms for connectionist reinforcement learning
TLDR
This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units that are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reInforcement tasks, and they do this without explicitly computing gradient estimates.
Transformer-XL: Attentive Language Models beyond a Fixed-Length Context
TLDR
This work proposes a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence, which consists of a segment-level recurrence mechanism and a novel positional encoding scheme.
Hierarchical Multiscale Recurrent Neural Networks
TLDR
A novel multiscale approach, called the hierarchical multiscales recurrent neural networks, which can capture the latent hierarchical structure in the sequence by encoding the temporal dependencies with different timescales using a novel update mechanism is proposed.
...
...