#### Filter Results:

- Full text PDF available (157)

#### Publication Year

1991

2017

- This year (4)
- Last 5 years (32)
- Last 10 years (77)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Data Set Used

#### Key Phrases

Learn More

- Gert R. G. Lanckriet, Nello Cristianini, Peter L. Bartlett, Laurent El Ghaoui, Michael I. Jordan
- Journal of Machine Learning Research
- 2002

Kernel-based learning algorithms work by embedding the data into a Euclidean space, and then searching for linear relations among the embedded data points. The embedding is performed implicitly, by specifying the inner products between each pair of points in the embedding space. This information is contained in the so-called kernel matrix, a symmetric and… (More)

- Peter L. Bartlett, Shahar Mendelson
- Journal of Machine Learning Research
- 2001

Abstract We investigate the use of certain data-dependent estimates of the complexity of a function class, called Rademacher and Gaussian complexities. In a decision theoretic setting, we prove general risk bounds in terms of these complexities. We consider function classes that can be expressed as combinations of functions from basis classes and show how… (More)

- Bernhard Schölkopf, Alexander J. Smola, Robert C. Williamson, Peter L. Bartlett
- Neural Computation
- 2000

We describe a new class of Support Vector algorithms for regression and classi cation In these algorithms a parameter lets one e ectively con trol the number of Support Vectors While this can be useful in its own right the parametrization has the additional bene t of enabling us to eliminate one of the other free parameters of the algorithm the accuracy… (More)

Many of the classification algorithms developed in the machine learning literature, including the support vector machine and boosting, can be viewed as minimum contrast methods that minimize a convex surrogate of the 0-1 loss function. The convexity makes these algorithms computationally efficient. The use of a surrogate, however, has statistical… (More)

- Peter L. Bartlett
- IEEE Trans. Information Theory
- 1998

Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization performance the number of training examples should grow at least linearly with the number of adjustable parameters in the network. Results in this paper show that if a large neural… (More)

- Peter L. Bartlett, Jonathan Baxter
- J. Artif. Intell. Res.
- 2001

Gradient-based approaches to direct policy search in reinforcement learning have received much recent attention as a means to solve problems of partial observability and to avoid some of the problems associated with policy degradation in value-function methods. In this paper we introduce GPOMDP, a simulation-based algorithm for generating a biased estimate… (More)

We propose new bounds on the error of learning algorithms in terms of a data-dependent notion of complexity. The estimates we establish give optimal rates and are based on a local and empirical version of Rademacher averages, in the sense that the Rademacher averages are computed from the data, on a subset of functions with small empirical error. We present… (More)

- John Shawe-Taylor, Peter L. Bartlett, Robert C. Williamson, Martin Anthony
- IEEE Trans. Information Theory
- 1998

The paper introduces some generalizations of Vapnik’s method of structural risk minimisation (SRM). As well as making explicit some of the details on SRM, it provides a result that allows one to trade off errors on the training sample against improved generalization performance. It then considers the more general case when the hierarchy of classes is chosen… (More)

- Alexander J. Smola, Peter L. Bartlett
- NIPS
- 2000

Peter Bartlett RSISE Australian National University Canberra, ACT, 0200 Peter.Bartlett@anu.edu.au We present a simple sparse greedy technique to approximate the maximum a posteriori estimate of Gaussian Processes with much improved scaling behaviour in the sample size m. In particular, computational requirements are O(n2m), storage is O(nm), the cost for… (More)