Neural Network with Unbounded Activations is Universal Approximator

@article{Sonoda2015NeuralNW,
  title={Neural Network with Unbounded Activations is Universal Approximator},
  author={Sho Sonoda and N. Murata},
  journal={ArXiv},
  year={2015},
  volume={abs/1505.03654}
}
This paper investigates the approximation property of the neural network with unbounded activation functions, such as the rectied linear unit (ReLU), which is new de-facto standard of deep learning. The ReLU network can be analyzed by the ridgelet transform with respect to Lizorkin distributions, which is introduced in this paper. By showing two reconstruction formulas by using the Fourier slice theorem and the Radon transform, it is shown that the neural network with unbounded activations… Expand
Integral representation of shallow neural network that attains the global minimum.
TLDR
The modified ridgelet transform has an explicit expression that can be computed by numerical integration, which suggests that the global minimizer of BP, without BP, can be obtained. Expand
Theory of Deep Convolutional Neural Networks III: Approximating Radial Functions
TLDR
It is proved that a family of deep neural networks consisting of two groups of convolutional layers, a downsampling operator, and a fully connected layer can outperform fully connected shallow networks in approximating radial functions with Q(x) = |x|2, when the dimension d of data from Rd is large. Expand
Integral representation of the global minimizer
TLDR
The obtained integral representation provides an explicit expression of the global minimizer, without linearity-like assumptions, such as partial linearity and monotonicity, and indicates that the ordinary ridgelet transform provides the minimum norm solution to the original training equation. Expand
Deep Convolutional Neural Nets
Neural nets are a class of predictors that have been shown empirically to achieve very good performance on tasks whose inputs are images, speech, or audio signals. They have also been applied toExpand
Gradient Descent can Learn Less Over-parameterized Two-layer Neural Networks on Classification Problems
TLDR
It is demonstrated that the separability assumption using a Neural tangent model is more reasonable than the positivity condition of the neural tangent kernel and provides a refined convergence analysis of the gradient descent for two-layer networks with smooth activations. Expand
The global optimum of shallow neural network is attained by ridgelet transform
TLDR
By introducing a continuous model of neural networks, this work reduces the training problem to a convex optimization in an infinite dimensional Hilbert space, and obtains the explicit expression of the global optimizer via the ridgelet transform. Expand
Fast generalization error bound of deep learning without scale invariance of activation functions
TLDR
It is shown that scale invariance of the activation functions is not essential to obtain a fast rate of convergence, and it is concluded that the theoretical framework proposed by Suzuki (2018) can be widely applied to the analysis of deep learning with general activation functions. Expand
A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case
TLDR
This paper characterize the norm required to realize a function as a single hidden-layer ReLU network with an unbounded number of units, but where the Euclidean norm of the weights is bounded, including precisely characterizing which functions can be realized with finite norm. Expand
Greedy Shallow Networks: A New Approach for Constructing and Training Neural Networks
TLDR
A novel greedy approach to obtain a single layer neural network approximation to a target function with the use of a ReLU activation function and an integral representation of the network, based on the ridgelet transform, that significantly reduces the cardinality of the dictionary and hence promotes feasibility of the proposed method. Expand
Effect of Activation Functions on the Training of Overparametrized Neural Nets
TLDR
This paper provides theoretical results about the effect of activation function on the training of highly overparametrized 2-layer neural networks and discusses a number of extensions and applications of these results. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 65 REFERENCES
Harmonic Analysis of Neural Networks
Abstract It is known that superpositions of ridge functions (single hidden-layer feedforward neural networks) may give good approximations to certain kinds of multivariate functions. It remainsExpand
Universal approximation bounds for superpositions of a sigmoidal function
  • A. Barron
  • Mathematics, Computer Science
  • IEEE Trans. Inf. Theory
  • 1993
TLDR
The approximation rate and the parsimony of the parameterization of the networks are shown to be advantageous in high-dimensional settings and the integrated squared approximation error cannot be made smaller than order 1/n/sup 2/d/ uniformly for functions satisfying the same smoothness assumption. Expand
Harmonic Analysis of Neural Networks
It is known that superpositions of ridge functions (single hidden-layer feedforward neural networks) may give good approximations to certain kinds of multivariate functions. It remains unclear,Expand
On the approximate realization of continuous mappings by neural networks
TLDR
It is proved that any continuous mapping can be approximately realized by Rumelhart-Hinton-Williams' multilayer neural networks with at least one hidden layer whose output functions are sigmoid functions. Expand
Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function
TLDR
It is shown that a standard multilayer feedforward network can approximate any continuous function to any degree of accuracy if and only if the network's activation functions are not polynomial. Expand
An Integral Representation of Functions Using Three-layered Networks and Their Approximation Bounds
  • N. Murata
  • Mathematics, Medicine
  • Neural Networks
  • 1996
TLDR
A new theorem on an integral transform of ridge functions is presented and an approximation bound, which evaluates the quantitative relationship between the approximation accuracy and the number of elements in the hidden layer, can be obtained. Expand
Representation of functions by superpositions of a step or sigmoid function and their applications to neural network theory
  • Y. Ito
  • Mathematics, Computer Science
  • Neural Networks
  • 1991
Abstract The starting point of this article is the inversion formula of the Radon transform; the article aims to contribute to the theory of three-layered neural networks. Let H be the HeavisideExpand
On rectified linear units for speech processing
TLDR
This work shows that it can improve generalization and make training of deep networks faster and simpler by substituting the logistic units with rectified linear units. Expand
ImageNet classification with deep convolutional neural networks
TLDR
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective. Expand
Approximation theory of the MLP model in neural networks
In this survey we discuss various approximation-theoretic problems that arise in the multilayer feedforward perceptron (MLP) model in neural networks. The MLP model is one of the more popular andExpand
...
1
2
3
4
5
...