What Size Net Gives Valid Generalization?

@article{Baum1989WhatSN,
  title={What Size Net Gives Valid Generalization?},
  author={Eric B. Baum and David Haussler},
  journal={Neural Computation},
  year={1989},
  volume={1},
  pages={151-160}
}
We address the question of when a network can be expected to generalize from m random training examples chosen from some arbitrary probability distribution, assuming that future test examples are drawn from the same distribution. Among our results are the following bounds on appropriate sample vs. network size. Assume 0 < ∊ 1/8. We show that if m O(W/∊ log N/∊) random examples can be loaded on a feedforward network of linear threshold functions with N nodes and W weights, so that at least a… Expand
When Are k-Nearest Neighbor and Back Propagation Accurate for Feasible Sized Sets of Examples?
  • E. Baum
  • Computer Science
  • EURASIP Workshop
  • 1990
TLDR
Experimental tests of generalization versus number of examples are presented for random target networks and examples drawn from a uniform distribution, and seem to indicate that networks with two hidden layers have Vapnik-Chervonenkis dimension roughly equal to their total number of weights. Expand
What size network is good for generalization of a specific task of interest?
TLDR
This work proposes a way to calculate the exact value of the probability that a network trained on a randomly chosen set of examples corresponding to a specific task of interest performs it with no more than a given number of errors, and shows that for some tasks increasing network size leads to worse generalization. Expand
Generalization and Selection of Examples in Feedforward Neural Networks
TLDR
It is shown that for the parity problem, one of the most used problems for testing learning algorithms, only the use of the whole set of examples ensures global learning in a depth two architecture and this difficulty can be overcome by considering a tree-structured network. Expand
Generalization and PAC learning: some new results for the class of generalized single-layer networks
TLDR
It is shown that the use of self-structuring techniques for GSLNs may reduce the number of training examples sufficient to guarantee good generalization performance, and an explanation for the fact that GSLNs can require a relatively large number of weights is provided. Expand
Limiting Network Size within Finite Bounds for Optimization
TLDR
A theoretical justification on required attribute size and its corresponding hidden layer dimension for a given sample set that gives an optimal binary classification results with minimum training complexity in a single layered feed forward network framework is brought out. Expand
Polynomial time algorithms for learning neural nets
  • E. Baum
  • Mathematics, Computer Science
  • COLT '90
  • 1990
TLDR
A more powerful query learning algorithm is described which is able to very rapidly learn depth 2 nets having n inputs connected to 4 threshold units connected to one or more outputs, or to learn the intersection of k half spaces in n unknowns. Expand
Improving Generalization with Active Learning January
Active learning di ers from passive learning from examples in that the learning algorithm assumes at least some control over what part of the input domain it receives information about In someExpand
Agnostic PAC Learning of Functions on Analog Neural Nets
  • W. Maass
  • Computer Science, Mathematics
  • Neural Computation
  • 1995
TLDR
The computation time of LEARN is exponential in the number of weights of the considered network architecture, and therefore only of interest for neural nets of small size. Expand
What Size Neural Network Gives Optimal Generalization? Convergence Properties of Backpropagation
One of the most important aspects of any machine learning paradigm is how i t scales according to problem size and complexity. Using a task with known optimal train ing error, and a pre-specifiedExpand
On the Generalization Problem
The generalization problem considered in this paper assumes that a limited amount of input and output data from a system is available, and that from this information an estimate of the outputExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 53 REFERENCES
On the capabilities of multilayer perceptrons
  • E. Baum
  • Computer Science, Mathematics
  • J. Complex.
  • 1988
TLDR
A construction is presented here for implementing an arbitrary dichotomy with one hidden layer containing [ N d ] units, for any set of N points in general position in d dimensions, which is in fact the smallest such net as dichotomies which cannot be implemented by any net with fewer units. Expand
Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm
  • N. Littlestone
  • Mathematics
  • 28th Annual Symposium on Foundations of Computer Science (sfcs 1987)
  • 1987
Valiant (1984) and others have studied the problem of learning various classes of Boolean functions from examples. Here we discuss incremental learning of these functions. We consider a setting inExpand
A statistical approach to learning and generalization in layered neural networks
TLDR
The proposed formalism is applied to the problems of selecting an optimal architecture and the prediction of learning curves and the Gibbs distribution on the ensemble of networks with a fixed architecture is derived. Expand
A Proposal for More Powerful Learning Algorithms
  • E. Baum
  • Computer Science
  • Neural Computation
  • 1989
TLDR
Neural nets are universal learners, capable of learning any learnable class of concepts, and standard circuit complexity arguments show that learning algorithms with such freedom can solve in polynomial time any learning problem that can be solved in poynomial time by any algorithm whatever. Expand
Large Automatic Learning, Rule Extraction, and Generalization
Since an tiquity, man has dreamed of building a de vice that would "learn from examples" 1 "form generalizations", and "discover t he rules" behind patt ern s in t he data. Recent work has shown thatExpand
A general lower bound on the number of examples needed for learning
TLDR
This paper proves a lower bound on the number of random examples required for distribution-free learning of a concept class C and shows that for many interesting concept classes, including k CNF and k DNF, the bound is actually tight to within a constant factor. Expand
The Vapnik-Chervonenkis Dimension: Information versus Complexity in Learning
TLDR
The work of Vapnik and Chervonenkis (1971) provides the key tools for dealing with the information issue and the main ideas are developed and how complexity fits in are discussed. Expand
Connectionist Learning Procedures
TLDR
These relatively simple, gradient-descent learning procedures work well for small tasks and the new challenge is to find ways of improving their convergence rate and their generalization abilities so that they can be applied to larger, more realistic tasks. Expand
Quantifying Inductive Bias: AI Learning Algorithms and Valiant's Learning Framework
  • D. Haussler
  • Mathematics, Computer Science
  • Artif. Intell.
  • 1988
Abstract We show that the notion of inductive bias in concept learning can be quantified in a way that directly relates to learning performance in the framework recently introduced by Valiant. OurExpand
A 'Neural' Network that Learns to Play Backgammon
TLDR
This study touches on some of the most important issues in network learning theory, including the development of efficient coding schemes and training procedures, scaling, generalization, the use of real-valued inputs and outputs, and techniques for escaping from local minima. Expand
...
1
2
3
4
5
...