PAC-Bayesian Inequalities for Martingales

  title={PAC-Bayesian Inequalities for Martingales},
  author={Yevgeny Seldin and François Laviolette and Nicol{\`o} Cesa-Bianchi and John Shawe-Taylor and Peter Auer},
  journal={IEEE Transactions on Information Theory},
We present a set of high-probability inequalities that control the concentration of weighted averages of multiple (possibly uncountably many) simultaneously evolving and interdependent martingales. Our results extend the PAC-Bayesian (probably approximately correct) analysis in learning theory from the i.i.d. setting to martingales opening the way for its application to importance weighted sampling, reinforcement learning, and other interactive learning domains, as well as many other domains in… 

Figures from this paper

PAC-Bayes-Bernstein Inequality for Martingales and its Application to Multiarmed Bandits

A new tool for data-dependent analysis of the exploration-exploitation trade-off in learning under limited feedback based on a new concentration inequality that makes it possible to control the concentration of weighted averages of multiple simultaneously evolving and interdependent martingales.

PAC-Bayes Analysis Beyond the Usual Bounds

A basic PAC-Bayes inequality for stochastic kernels is presented, from which one may derive extensions of various known PAC- Bayes bounds as well as novel bounds, and a simple bound for a loss function with unbounded range is presented.

Novel Change of Measure Inequalities and PAC-Bayesian Bounds

This work proposes a multiplicative change of measure inequality for $\alpha$-divergences, which leads to tighter bounds under some technical conditions and presents several PAC-Bayesian bounds for various classes of random variables, by using the novel change ofMeasure inequalities.

PAC-Bayesian Transportation Bound

A new generalization error bound is developed, the PAC-Bayesian transportation bound, which is the first PAC- Bayesian bound that relates the risks of any two predictors according to their distance, and capable of evaluating the cost of de-randomization of stochastic predictors faced with continuous loss functions.

Novel Change of Measure Inequalities with Applications to PAC-Bayesian Bounds and Monte Carlo Estimation

Several applications are presented, including PAC-Bayesian bounds for various classes of losses and non-asymptotic intervals for Monte Carlo estimates and a generalized version of Hammersley-Chapman-Robbins inequality.

Simpler PAC-Bayesian bounds for hostile data

This paper provides PAC-Bayesian learning bounds that hold for dependent, heavy-tailed observations (hereafter referred to as hostile data) and proves a general PAC- Bayesian bound, and shows how to use it in various hostile settings.

A Strongly Quasiconvex PAC-Bayesian Bound

It is shown that the PAC-Bayesian bound can be rewritten as a one-dimensional function of the trade-off parameter and provide sufficient conditions under which the function has a single global minimum.

A New Family of Generalization Bounds Using Samplewise Evaluated CMI

A new family of information-theoretic generalization bounds is presented, in which the training loss and the population loss are compared through a jointly convex function, and a samplewise, average version of Seeger’s PAC-Bayesian bound is derived.

Tighter PAC-Bayes Generalisation Bounds by Leveraging Example Difficulty

A modified version of the excess risk is introduced, which can be used to obtain tighter, fast-rate PAC-Bayesian generalisation bounds and a new bound for [ − 1 , 1]-valued signed losses, which is more favourable when they empirically have low variance around 0.05.



A PAC analysis of a Bayesian estimator

The paper uses the techniques to give the first PAC style analysis of a Bayesian inspired estimator of generalisation, the size of a ball which can be placed in the consistent region of parameter space, and the resulting bounds are independent of the complexity of the function class though they depend linearly on the dimensionality of the parameter space.

Bayesian Gaussian process models : PAC-Bayesian generalisation error bounds and sparse approximations

The tractability and usefulness of simple greedy forward selection with information-theoretic criteria previously used in active learning is demonstrated and generic schemes for automatic model selection with many (hyper)parameters are developed.

PAC-Bayesian Analysis of Contextual Bandits

The analysis allows to provide the algorithm large amount of side information, let the algorithm to decide which side information is relevant for the task, and penalize the algorithm only for the side information that it is using de facto.

Empirical Bernstein Bounds and Sample-Variance Penalization

Improved constants for data dependent and variance sensitive confidence bounds are given, called empirical Bernstein bounds, and extended to hold uniformly over classes of functions whose growth function is polynomial in the sample size n, and sample variance penalization is considered.

PAC-Bayesian Generalisation Error Bounds for Gaussian Process Classification

  • M. Seeger
  • Computer Science
    J. Mach. Learn. Res.
  • 2002
By applying the PAC-Bayesian theorem of McAllester (1999a), this paper proves distribution-free generalisation error bounds for a wide range of approximate Bayesian GP classification techniques, giving a strong learning-theoretical justification for the use of these techniques.

PAC-Bayesian Stochastic Model Selection

A PAC-Bayesian performance guarantee for stochastic model selection that is superior to analogous guarantees for deterministic model selection and shown that the posterior optimizing the performance guarantee is a Gibbs distribution.

Distribution-Dependent PAC-Bayes Priors

The idea that the PAC-Bayes prior can be informed by the data-generating distribution is developed, sharp bounds for an existing framework are proved, and insights into function class complexity are developed in this model and means of controlling it with new algorithms are suggested.


1. Let be a probability space and,be an increasing family of sub o'-fields of(we put(c) Let (xn)n=1, 2, •c be a sequence of bounded martingale differences on , that is,xn(ƒÖ) is bounded almost surely

Some PAC-Bayesian Theorems

The PAC-Bayesian theorems given here apply to an arbitrary prior measure on an arbitrary concept space and provide an alternative to the use of VC dimension in proving PAC bounds for parameterized concepts.

Exploration-exploitation tradeoff using variance estimates in multi-armed bandits