Andrew R. Barron

Learn More
Approximation properties of a class of artificial neural networks are established. It is shown that feedforward networks with one layer of sigmoidal nonlinearities achieve inte­ grated squared error of order O(l/n), where n is the number of nodes. The function appruximated is assumed to have a bound on the first moment of the magnitude distribution of the(More)
We review the principles of Minimum Description Length and Stochastic Complexity as used in data compression and statistical modeling. Stochastic complexity is formulated as the solution to optimum universal coding problems extending Shannon’s basic source coding theorem. The normalized maximized likelihood, mixture, and predictive codings are each shown to(More)
The minimum complexity or minimum description-length criterion developed by Kolmogorov, Rissanen, Wallace, So&in, and others leads to consistent probability density estimators. These density estimators are defined to achieve the best compromise between likelihood and simplicity. A related issue is the compromise between accuracy of approximations and(More)
In the absence of knowledge of the true density function, Bayesian models take the joint density function for a sequence of n random variables to be an average of densities with respect to a prior. We examine the relative entropy distance D,, between the true density and the Bayesian density and show that the asymptotic distance is ( d / 2 X l o g n ) + c,(More)
For a common class of artificial neural networks, the mean integrated squared error between the estimated network and a target functionf is shown to be bounded by $$O\left( {\frac{{\mathop c\nolimits_f^2 }}{n}} \right) + O\left( {\frac{{nd}}{N}\log N} \right)$$ wheren is the number of nodes,d is the input dimension of the function,N is the number of(More)
Andrew R. Barron Department of Statistics Yale University P.O. Box 208290 New Haven, CT 06520 Andrew. Barron@yale. edu Gaussian mixtures (or so-called radial basis function networks) for density estimation provide a natural counterpart to sigmoidal neural networks for function fitting and approximation. In both cases, it is possible to give simple(More)
We consider the problem of approximating a given element f from a Hilbert space H by means of greedy algorithms and the application of such procedures to the regression problem in statistical learning theory. We improve on the existing theory of convergence rates for both the orthogonal greedy algorithm and the relaxed greedy algorithm, as well as for the(More)
For problems of data compression, gambling, and prediction of individual sequences 1 the following questions arise. Given a target family of probability mass functions ( 1 ), how do we choose a probability mass function ( 1 ) so that it approximately minimizes the maximum regret /belowdisplayskip10ptminus6pt max (log 1 ( 1 ) log 1 ( 1 )̂) and so that it(More)
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.. Institute of(More)