• Corpus ID: 53219550

MACRO: A Meta-Algorithm for Conditional Risk Minimization

@article{Zimin2018MACROAM,
  title={MACRO: A Meta-Algorithm for Conditional Risk Minimization},
  author={Alexander Zimin and Christoph H. Lampert},
  journal={arXiv: Machine Learning},
  year={2018}
}
We study conditional risk minimization (CRM), i.e. the problem of learning a hypothesis of minimal risk for prediction at the next step of sequentially arriving dependent data. Despite it being a fundamental problem, successful learning in the CRM sense has so far only been demonstrated using theoretical algorithms that cannot be used for real problems as they would require storing all incoming data. In this work, we introduce MACRO, a meta-algorithm for CRM that does not suffer from this… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 34 REFERENCES
Conditional Risk Minimization for Stochastic Processes
TLDR
A practical estimator for the conditional risk based on the theory of non-parametric time-series prediction, and a finite sample concentration bound that establishes uniform convergence of the estimator to the true conditional risk under certain regularity assumptions on the process.
Learning Theory for Conditional Risk Minimization
TLDR
The main results are two theorems that establish criteria for learnability for many classes of stochastic processes, including all special cases studied previously in the literature.
Online Learning with Prior Knowledge
TLDR
The standard so-called experts algorithms are methods for utilizing a given set of "experts" to make good choices in a sequential decision-making problem by allowing an experts algorithm to rely on state information, namely, partial information about the cost function, which is revealed to the decision maker before the latter chooses an action.
Predictive PAC Learning and Process Decompositions
TLDR
It is argued that it is natural in predictive PAC to condition not on the past observations but on the mixture component of the sample path, and a novel PAC generalization bound for mixtures of learnable processes with a generalization error that is not worse than that of each mixture component.
Optimal learning with Bernstein online aggregation
TLDR
This work introduces a new recursive aggregation procedure called Bernstein Online Aggregation (BOA), which is optimal for the model selection aggregation problem in the bounded iid setting for the square loss and is the first online algorithm that satisfies the fast rate of convergence.
Time series prediction and online learning
TLDR
The first generalization bounds for a hypothesis derived by online-to-batch conversion of the sequence of hypotheses output by an online algorithm are proved, in the general setting of a non-stationary non-mixing stochastic process.
On the generalization ability of on-line learning algorithms
TLDR
This paper proves tight data-dependent bounds for the risk of this hypothesis in terms of an easily computable statistic M/sub n/ associated with the on-line performance of the ensemble, and obtains risk tail bounds for kernel perceptron algorithms interms of the spectrum of the empirical kernel matrix.
Prediction of time series by statistical learning: general losses and fast rates
Abstract We establish rates of convergences in statistical learning for time series forecasting. Using the PAC-Bayesian approach, slow rates of convergence √ d/n for the Gibbs estimator under the
Predictive PAC Learnability: A Paradigm for Learning from Exchangeable Input Data
  • V. Pestov
  • Mathematics
    2010 IEEE International Conference on Granular Computing
  • 2010
TLDR
Using de Finetti's theorem, it is shown that if a universally separable function class $\mathscr F$ is distribution-free PAC learnable under i.i.d. inputs, then it is Distribution-free predictive PAC learnability under exchangeable inputs, with a slightly worse sample complexity.
Stability Bounds for Stationary φ-mixing and β-mixing Processes
TLDR
Novel and distinct stability-based generalization bounds for stationary φ-mixing and β- Mixing sequences are proved, which can be viewed as the first theoretical basis for the use of these algorithms in non-i.i.d. scenarios.
...
1
2
3
4
...