Share This Author
Rademacher and Gaussian Complexities: Risk Bounds and Structural Results
This work investigates the use of certain data-dependent estimates of the complexity of a function class called Rademacher and Gaussian complexities and proves general risk bounds in terms of these complexities in a decision theoretic setting.
Learning the Kernel Matrix with Semidefinite Programming
- G. Lanckriet, N. Cristianini, P. Bartlett, L. Ghaoui, Michael I. Jordan
- Computer ScienceJ. Mach. Learn. Res.
- 8 July 2002
This paper shows how the kernel matrix can be learned from data via semidefinite programming (SDP) techniques and leads directly to a convex method for learning the 2-norm soft margin parameter in support vector machines, solving an important open problem.
Neural Network Learning - Theoretical Foundations
The authors explain the role of scale-sensitive versions of the Vapnik Chervonenkis dimension in large margin classification, and in real prediction, and discuss the computational complexity of neural network learning.
New Support Vector Algorithms
A new class of support vector algorithms for regression and classification that eliminates one of the other free parameters of the algorithm: the accuracy parameter in the regression case, and the regularization constant C in the classification case.
Convexity, Classification, and Risk Bounds
A general quantitative relationship between the risk as assessed using the 0–1 loss and the riskAs assessed using any nonnegative surrogate loss function is provided, and it is shown that this relationship gives nontrivial upper bounds on excess risk under the weakest possible condition on the loss function.
Spectrally-normalized margin bounds for neural networks
This bound is empirically investigated for a standard AlexNet network trained with SGD on the mnist and cifar10 datasets, with both original and random labels; the bound, the Lipschitz constants, and the excess risks are all in direct correlation, suggesting both that SGD selects predictors whose complexity scales with the difficulty of the learning task, and that the presented bound is sensitive to this complexity.
Local Rademacher complexities
New bounds on the error of learning algorithms in terms of a data-dependent notion of complexity are proposed and some applications to classification and prediction with convex function classes, and with kernel classes in particular are presented.
Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates
A main result of this work is a sharp analysis of two robust distributed gradient descent algorithms based on median and trimmed mean operations, respectively, which are shown to achieve order-optimal statistical error rates for strongly convex losses.
RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning
- Yan Duan, J. Schulman, Xi Chen, P. Bartlett, Ilya Sutskever, P. Abbeel
- Computer ScienceArXiv
- 4 November 2016
This paper proposes to represent a "fast" reinforcement learning algorithm as a recurrent neural network (RNN) and learn it from data, encoded in the weights of the RNN, which are learned slowly through a general-purpose ("slow") RL algorithm.
The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network
- P. Bartlett
- Computer ScienceIEEE Trans. Inf. Theory
- 1 March 1998
Results in this paper show that if a large neural network is used for a pattern classification problem and the learning algorithm finds a network with small weights that has small squared error on the training patterns, then the generalization performance depends on the size of the weights rather than the number of weights.