Learn More
Approximation properties of a class of artificial neural networks are established. It is shown that feedforward networks with one layer of sigmoidal nonlinearities achieve inte­ grated squared error of order O(l/n), where n is the number of nodes. The function appruximated is assumed to have a bound on the first moment of the magnitude distribution of the(More)
— We review the principles of Minimum Description Length and Stochastic Complexity as used in data compression and statistical modeling. Stochastic complexity is formulated as the solution to optimum universal coding problems extending Shannon's basic source coding theorem. The normalized maximized likelihood, mixture, and predictive codings are each shown(More)
The minimum complexity or minimum description length criterion developed by Kolmogorov, Rissanen, Wallace, So&in, and others leads to consistent probability density estimators. These density estimators are defined to achieve the best compromise between likelihood and simplicity. A related issue is the compromise between accuracy of approximations and(More)
In the absence of knowledge of the true density function, Bayesian models take the joint density function for a sequence of n random variables to be an average of densities with respect to a prior. We examine the relative entropy distance D,, between the true density and the Bayesian density and show that the asymptotic distance is (d / 2 X l o g n) + c,(More)
Gaussian mixtures (or so-called radial basis function networks) for density estimation provide a natural counterpart to sigmoidal neu-ral networks for function fitting and approximation. In both cases, it is possible to give simple expressions for the iterative improvement of performance as components of the network are introduced one at a time. In(More)
We consider the problem of approximating a given element f from a Hilbert space H by means of greedy algorithms and the application of such procedures to the regression problem in statistical learning theory. We improve on the existing theory of convergence rates for both the orthogonal greedy algorithm and the relaxed greedy algorithm, as well as for the(More)
For a common class of artificial neural networks, the mean integrated squared error between the estimated network and a target functionf is shown to be bounded by $$O\left( {\frac{{\mathop c\nolimits_f^2 }}{n}} \right) + O\left( {\frac{{nd}}{N}\log N} \right)$$ wheren is the number of nodes,d is the input dimension of the function,N is the number of(More)
—For problems of data compression, gambling, and prediction of individual sequences 1 the following questions arise. Given a target family of probability mass functions (1), how do we choose a probability mass function (1) so that it approximately minimizes the maximum regret /belowdisplayskip10ptminus6pt max (log 1 (1) log 1 (1 ^)) and so that it achieves(More)
For Gaussian regression, we develop and analyze methods for combining estimators from various models. For squared-error loss, an unbiased estimator of the risk of the mixture of general estimators is developed. Special attention is given to the case that the component estimators are least-squares projections into arbitrary linear subspaces, such as those(More)
New families of Fisher information and entropy power inequalities for sums of independent random variables are presented. These inequalities relate the information in the sum of n independent random variables to the information contained in sums over subsets of the random variables, for an arbitrary collection of subsets. As a consequence, a simple proof of(More)