Neural Network with Unbounded Activations is Universal Approximator

  title={Neural Network with Unbounded Activations is Universal Approximator},
  author={Sho Sonoda and N. Murata},

Figures and Tables from this paper

Integral representation of shallow neural network that attains the global minimum.

The modified ridgelet transform has an explicit expression that can be computed by numerical integration, which suggests that the global minimizer of BP, without BP, can be obtained.

Deep Convolutional Neural Nets

In this chapter, neural nets are a class of predictors that have been shown empirically to achieve very good performance on tasks whose inputs are images, speech, or audio signals, and they often generalize better than one would predict.

The global optimum of shallow neural network is attained by ridgelet transform

By introducing a continuous model of neural networks, this work reduces the training problem to a convex optimization in an infinite dimensional Hilbert space, and obtains the explicit expression of the global optimizer via the ridgelet transform.

Nonconvex regularization for sparse neural networks

Tunable Activation Functions for Deep Neural Networks

The performance of artificial neural networks significantly depends on the choice of the nonlinear activation function of the neuron. Usually this choice comes down to an empirical one from a list of

Double Continuum Limit of Deep Neural Networks

This study-in-progress can synthesize a deep neural network from broken line approximation and numerical integration of a double continuum model, without backpropagation, and develops the ridgelet transform for potential field, and synthesized an autoencoder without back Propagation.

Numerical Integration Method for Training Neural Network

A generalized kernel quadrature method with a fast convergence guarantee in a function norm that is applicable to signed measures, and a natural choice of kernels is developed.

Neural Networks and Deep Learning




Harmonic Analysis of Neural Networks

A special admissibility condition for neural activation functions is introduced which requires that the neural activation function be oscillatory and linear transforms are constructed which represent quite general functions f as a superposition of ridge functions.

Universal approximation bounds for superpositions of a sigmoidal function

  • A. Barron
  • Computer Science
    IEEE Trans. Inf. Theory
  • 1993
The approximation rate and the parsimony of the parameterization of the networks are shown to be advantageous in high-dimensional settings and the integrated squared approximation error cannot be made smaller than order 1/n/sup 2/d/ uniformly for functions satisfying the same smoothness assumption.

An Integral Representation of Functions Using Three-layered Networks and Their Approximation Bounds

  • N. Murata
  • Computer Science, Mathematics
    Neural Networks
  • 1996

ImageNet classification with deep convolutional neural networks

A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

Approximation theory of the MLP model in neural networks

  • A. Pinkus
  • Computer Science, Mathematics
    Acta Numerica
  • 1999
In this survey we discuss various approximation-theoretic problems that arise in the multilayer feedforward perceptron (MLP) model in neural networks. The MLP model is one of the more popular and

Construction of neural nets using the radon transform

The authors present a method for constructing a feedforward neural net implementing an arbitrarily good approximation to any L/sub 2/ function over (-1, 1)/sup n/. The net uses n input nodes, a

Improving deep neural networks for LVCSR using rectified linear units and dropout

Modelling deep neural networks with rectified linear unit (ReLU) non-linearities with minimal human hyper-parameter tuning on a 50-hour English Broadcast News task shows an 4.2% relative improvement over a DNN trained with sigmoid units, and a 14.4% relative improved over a strong GMM/HMM system.