# Oracle inequalities for computationally adaptive model selection

@article{Agarwal2012OracleIF, title={Oracle inequalities for computationally adaptive model selection}, author={Alekh Agarwal and Peter L. Bartlett and John C. Duchi}, journal={ArXiv}, year={2012}, volume={abs/1208.0129} }

We analyze general model selection procedures using penalized empirical loss minimization under computational constraints. While classical model selection approaches do not consider computational aspects of performing model selection, we argue that any practical model selection procedure must not only trade off estimation and approximation error, but also the computational effort required to compute empirical minimizers for different function classes. We provide a framework for analyzing such…

## Figures from this paper

## 17 Citations

Designing Statistical Estimators That Balance Sample Size, Risk, and Computational Cost

- Computer ScienceIEEE Journal of Selected Topics in Signal Processing
- 2015

This work uses regularized linear regression as a case study to argue for the existence of a tradeoff between computational time, sample complexity, and statistical accuracy that applies to statistical estimators based on convex optimization.

Non-stochastic Best Arm Identification and Hyperparameter Optimization

- Computer ScienceAISTATS
- 2016

This work casts hyperparameter optimization as an instance of non-stochastic best-arm identification, identifies a known algorithm that is well-suited for this setting, and empirically evaluates its behavior.

Generalized Rank-Breaking: Computational and Statistical Tradeoffs

- Computer ScienceJ. Mach. Learn. Res.
- 2018

To compute the pseudo-maximum likelihood estimate of the set-wise comparisons, this work provides a generalization of the minorization maximization algorithm and gives guarantees on its convergence.

Multiple Optimality Guarantees in Statistical Learning

- Computer Science
- 2014

This thesis considers the fundamental questions that arise when trading between multiple such criteria--computation, communication, privacy--while maintaining statistical performance, and proves fundamental lower bounds on the statistical performance of any algorithm subject to the constraints ofcomputational, confidentiality, or communication.

Computation-Risk Tradeoffs for Covariance-Thresholded Regression

- Mathematics, Computer ScienceICML
- 2013

This analysis shows how the risk of this family of linear regression estimators varies with the sparsity and regularization level, thus establishing a statistical estimation setting for which there is an explicit, smooth tradeoff between risk and computation.

Computational and Statistical Tradeoffs in Learning to Rank

- Computer ScienceNIPS
- 2016

In the application of learning to rank, this work provides a hierarchy of rank-breaking mechanisms ordered by the complexity in thus generated sketch of the data that allows the number of data points collected to be gracefully traded off against computational resources available, while guaranteeing the desired level of accuracy.

Computational and statistical tradeoffs via convex relaxation

- Computer ScienceProceedings of the National Academy of Sciences
- 2013

This paper defines a notion of “algorithmic weakening,” in which a hierarchy of algorithms is ordered by both computational efficiency and statistical efficiency, allowing the growing strength of the data at scale to be traded off against the need for sophisticated processing.

High-dimensional change-point estimation: Combining filtering with convex optimization

- Computer Science2015 IEEE International Symposium on Information Theory (ISIT)
- 2015

The main result of this paper shows that the method performs change-point estimation reliably as long as the product of the smallest-sized change and the smallest distance between change-points is larger than a Gaussian width parameter that characterizes the low-dimensional complexity of the underlying signal sequence.

Automating model search for large scale machine learning

- Computer ScienceSoCC
- 2015

An architecture for automatic machine learning at scale comprised of a cost-based cluster resource allocation estimator, advanced hyper-parameter tuning techniques, bandit resource allocation via runtime algorithm introspection, and physical optimization via batching and optimal resource allocation is proposed.

Time – Data Tradeo ff s by Aggressive Smoothing

- Computer Science
- 2014

This work provides theoretical and experimental evidence of a tradeoff between sample complexity and computation time that applies to statistical estimators based on convex optimization for a class of regularized linear inverse problems.

## References

SHOWING 1-10 OF 35 REFERENCES

Risk bounds for model selection via penalization

- Mathematics, Computer Science
- 1999

It is shown that the quadratic risk of the minimum penalized empirical contrast estimator is bounded by an index of the accuracy of the sieve, which quantifies the trade-off among the candidate models between the approximation error and parameter dimension relative to sample size.

Complexity regularization via localized random penalties

- Computer Science, Mathematics
- 2004

This article proposes a new complexity-penalized model selection method based on data-dependent penalties, and considers the binary classification problem where, given a random observation X ∈ R d, one has to predict Y ∈ {0,1}.

Empirical minimization

- Mathematics
- 2006

We investigate the behavior of the empirical minimization algorithm using various methods. We first analyze it by comparing the empirical, random, structure and the original one on the class, either…

Robust Stochastic Approximation Approach to Stochastic Programming

- Computer Science, MathematicsSIAM J. Optim.
- 2009

It is intended to demonstrate that a properly modified SA approach can be competitive and even significantly outperform the SAA method for a certain class of convex stochastic problems.

Convexity, Classification, and Risk Bounds

- Computer Science
- 2006

A general quantitative relationship between the risk as assessed using the 0–1 loss and the riskAs assessed using any nonnegative surrogate loss function is provided, and it is shown that this relationship gives nontrivial upper bounds on excess risk under the weakest possible condition on the loss function.

Rademacher and Gaussian Complexities: Risk Bounds and Structural Results

- Computer ScienceJ. Mach. Learn. Res.
- 2001

This work investigates the use of certain data-dependent estimates of the complexity of a function class called Rademacher and Gaussian complexities and proves general risk bounds in terms of these complexities in a decision theoretic setting.

Nonparametric Maximum Likelihood Estimation by the Method of Sieves

- Mathematics
- 1982

Maximum likelihood estimation often fails when the parameter takes values in an infinite dimensional space. For example, the maximum likelihood method cannot be applied to the completely…

Finite-time Analysis of the Multiarmed Bandit Problem

- Computer ScienceMachine Learning
- 2004

This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.

A Stochastic Approximation Method

- Mathematics
- 2007

Let M(x) denote the expected value at level x of the response to a certain experiment. M(x) is assumed to be a monotone function of x but is unknown tot he experiment, and it is desire to find the…

Local Rademacher complexities

- Computer Science, Mathematics
- 2005

New bounds on the error of learning algorithms in terms of a data-dependent notion of complexity are proposed and some applications to classification and prediction with convex function classes, and with kernel classes in particular are presented.