Extending Universal Approximation Guarantees: A Theoretical Justification for the Continuity of Real-World Learning Tasks

  title={Extending Universal Approximation Guarantees: A Theoretical Justification for the Continuity of Real-World Learning Tasks},
  author={Naveen Durvasula},
Universal Approximation Theorems establish the density of various classes of neural network function approximators in C ( K, R m ) , where K ⊂ R n is compact. In this paper, we aim to extend these guarantees by establishing conditions on learning tasks that guarantee their continuity. We consider learning tasks given by conditional expectations x (cid:55)→ E [ Y | X = x ] , where the learning target Y = f ◦ L is a potentially pathological transformation of some underlying data-generating… 

Figures from this paper



Universal Approximation with Deep Narrow Networks

The classical Universal Approximation Theorem holds for neural networks of arbitrary width and bounded depth, and nowhere differentiable activation functions, density in noncompact domains with respect to the $L^p$-norm, and how the width may be reduced to just $n + m + 1$ for `most' activation functions.

The Expressive Power of Neural Networks: A View from the Width

It is shown that there exist classes of wide networks which cannot be realized by any narrow network whose depth is no more than a polynomial bound, and that narrow networks whose size exceed the polynometric bound by a constant factor can approximate wide and shallow network with high accuracy.

Lipschitz continuity of probability kernels in the optimal transport framework.

General conditions for the Lipschitz continuity of probability kernels with respect to metric structures arising within the optimal transport framework, such as the Wasserstein metric are given.

Minimum Width for Universal Approximation

This work provides the first definitive result in this direction for networks using the ReLU activation functions: the minimum width required for the universal approximation of the L^p functions is exactly $\max\{d_x+1,d_y\}$.

Approximating Continuous Functions by ReLU Nets of Minimal Width

This article concerns the expressive power of depth in deep feed-forward neural nets with ReLU activations. Specifically, we answer the following question: for a fixed $d\geq 1,$ what is the minimal

Bayesian inverse problems for functions and applications to fluid mechanics

In this paper we establish a mathematical framework for a range of inverse problems for functions, given a finite set of noisy observations. The problems are hence underdetermined and are often

The Bayesian Approach to Inverse Problems

These lecture notes highlight the mathematical and computational structure relating to the formulation of, and development of algorithms for, the Bayesian approach to inverse problems in

Approximation by superpositions of a sigmoidal function

  • G. Cybenko
  • Computer Science
    Math. Control. Signals Syst.
  • 1989
In this paper we demonstrate that finite linear combinations of compositions of a fixed, univariate function and a set of affine functionals can uniformly approximate any continuous function ofn real

Visualizing Data using t-SNE

A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.

Multilayer feedforward networks are universal approximators