• Corpus ID: 210180949

Neural Arithmetic Units

  title={Neural Arithmetic Units},
  author={Andreas Madsen and alexander rosenberg johansen},
Neural networks can approximate complex functions, but they struggle to perform exact arithmetic operations over real numbers. The lack of inductive bias for arithmetic operations leaves neural networks without the underlying logic necessary to extrapolate on tasks such as addition, subtraction, and multiplication. We present two new neural network components: the Neural Addition Unit (NAU), which can learn exact addition and subtraction; and the Neural Multiplication Unit (NMU) that can… 

Neural Power Units

The Neural Power Unit (NPU) is introduced that operates on the full domain of real numbers and is capable of learning arbitrary power functions in a single layer and fixes the shortcomings of existing arithmetic units and extends their expressivity.

Learning Division with Neural Arithmetic Logic Modules

It is shown that robustly learning division in a systematic manner remains a challenge even at the simplest level of dividing two numbers, and two novel approaches for division are proposed which are called the Neural Reciprocal Unit (NRU) and the Neural Multiplicative Reciproc Unit (NMRU).

Neural Status Registers

The Neural Status Register is introduced, inspired by physical Status Registers, and at the heart of the NSR are arithmetic comparisons between inputs that allow end-to-end differentiation and learns such comparisons reliably.

Exploring the Learning Mechanisms of Neural Division Modules

It is shown that robustly learning division in a systematic manner remains a challenge even at the simplest level of dividing two numbers, and a novel approach to division is proposed which is called the Neural Reciprocal Unit (NRU) and the Neural Multiplicative Reciproc Unit (NMRU).

A Primer for Neural Arithmetic Logic Modules

Focusing on the shortcomings of NALU, an in-depth analysis is provided to reason about design choices of recent units to highlight inconsistencies in a fundamental experiment causing the inability to directly compare across papers.

Improving the Robustness of Neural Multiplication Units with Reversible Stochasticity

It is shown that Neural Multiplication Units (NMUs) are unable to reliably learn tasks as simple as multiplying two inputs when given different training ranges, and stochasticity provides improved robustness with the potential to improve learned representations of upstream networks for numerical and image tasks.

Fast Neural Models for Symbolic Regression at Scale

This work introduces OccamNet, a neural network model that finds interpretable, compact, and sparse solutions for fitting data, à la Occam’s razor, and introduces a two-step optimization method that samples functions and updates the weights with backpropagation based on cross-entropy matching in an evolutionary strategy.

Evolutionary Training and Abstraction Yields Algorithmic Generalization of Neural Computers

The Neural Harvard Computer is presented, a memory-augmented network based architecture that employs abstraction by decoupling algorithmic operations from data manipulations, realized by splitting the information flow and separated modules to enable the learning of robust and scalable algorithmic solutions.

How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks

The success of GNNs in extrapolating algorithmic tasks to new data relies on encoding task-specific non-linearities in the architecture or features, and a hypothesis is suggested for which theoretical and empirical evidence is provided.

Learning Arithmetic Operations With A Multistep Deep Learning

It is shown that this mechanism applied to a simple multilayer perceptron can significantly improve its performance when learning either a multi-digit addition or multiplication, which are simple but yet challenging operations to learn.



Neural Arithmetic Logic Units

Experiments show that NALU-enhanced neural networks can learn to track time, perform arithmetic over images of numbers, translate numerical language into real-valued scalars, execute computer code, and count objects in images.

Measuring Arithmetic Extrapolation Performance

It is found that consistently learning arithmetic extrapolation is challenging, in particular for multiplication, in the first extensive evaluation with respect to convergence of the NALU and its sub-units.

Neural GPUs Learn Algorithms

It is shown that the Neural GPU can be trained on short instances of an algorithmic task and successfully generalize to long instances, and a technique for training deep recurrent networks: parameter sharing relaxation is introduced.

Understanding the difficulty of training deep feedforward neural networks

The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future.

Neural Arithmetic Expression Calculator

This paper presents a pure neural solver for arithmetic expression calculation (AEC) problem, which includes the adding, subtracting, multiplying, dividing and bracketing operations, and regards the arithmetic expressions calculation as a hierarchical reinforcement learning problem.

Grid Long Short-Term Memory

The Grid LSTM is used to define a novel two-dimensional translation model, the Reencoder, and it is shown that it outperforms a phrase-based reference system on a Chinese-to-English translation task.

Adam: A Method for Stochastic Optimization

This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

On Evaluating the Generalization of LSTM Models in Formal Languages

This paper empirically evaluates the inductive learning capabilities of Long Short-Term Memory networks, a popular extension of simple RNNs, to learn simple formal languages.

Improving the Neural GPU Architecture for Algorithm Learning

The proposed architecture is the first capable of learning decimal multiplication end-to-end and a new technique - hard nonlinearities with saturation costs- that has general applicability is introduced that can be applied to active-memory models.

Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks

This paper introduces the SCAN domain, consisting of a set of simple compositional navigation commands paired with the corresponding action sequences, and tests the zero-shot generalization capabilities of a variety of recurrent neural networks trained on SCAN with sequence-to-sequence methods.