Oracle inequalities for computationally adaptive model selection
@article{Agarwal2012OracleIF, title={Oracle inequalities for computationally adaptive model selection}, author={Alekh Agarwal and Peter L. Bartlett and John C. Duchi}, journal={ArXiv}, year={2012}, volume={abs/1208.0129} }
We analyze general model selection procedures using penalized empirical loss minimization under computational constraints. While classical model selection approaches do not consider computational aspects of performing model selection, we argue that any practical model selection procedure must not only trade off estimation and approximation error, but also the computational effort required to compute empirical minimizers for different function classes. We provide a framework for analyzing such…
Figures from this paper
17 Citations
Designing Statistical Estimators That Balance Sample Size, Risk, and Computational Cost
- Computer ScienceIEEE Journal of Selected Topics in Signal Processing
- 2015
This work uses regularized linear regression as a case study to argue for the existence of a tradeoff between computational time, sample complexity, and statistical accuracy that applies to statistical estimators based on convex optimization.
Non-stochastic Best Arm Identification and Hyperparameter Optimization
- Computer ScienceAISTATS
- 2016
This work casts hyperparameter optimization as an instance of non-stochastic best-arm identification, identifies a known algorithm that is well-suited for this setting, and empirically evaluates its behavior.
Generalized Rank-Breaking: Computational and Statistical Tradeoffs
- Computer ScienceJ. Mach. Learn. Res.
- 2018
To compute the pseudo-maximum likelihood estimate of the set-wise comparisons, this work provides a generalization of the minorization maximization algorithm and gives guarantees on its convergence.
An Asymptotically Optimal Multi-Armed Bandit Algorithm and Hyperparameter Optimization
- Computer ScienceArXiv
- 2020
This paper proposes an efficient and robust bandit-based algorithm called Sub-Sampling (SS), which evaluates the potential of hyperparameters by the sub-samples of observations and is theoretically proved to be optimal under the criterion of cumulative regret.
Multiple Optimality Guarantees in Statistical Learning
- Computer Science
- 2014
This thesis considers the fundamental questions that arise when trading between multiple such criteria--computation, communication, privacy--while maintaining statistical performance, and proves fundamental lower bounds on the statistical performance of any algorithm subject to the constraints ofcomputational, confidentiality, or communication.
Computation-Risk Tradeoffs for Covariance-Thresholded Regression
- Mathematics, Computer ScienceICML
- 2013
This analysis shows how the risk of this family of linear regression estimators varies with the sparsity and regularization level, thus establishing a statistical estimation setting for which there is an explicit, smooth tradeoff between risk and computation.
Computational and Statistical Tradeoffs in Learning to Rank
- Computer ScienceNIPS
- 2016
In the application of learning to rank, this work provides a hierarchy of rank-breaking mechanisms ordered by the complexity in thus generated sketch of the data that allows the number of data points collected to be gracefully traded off against computational resources available, while guaranteeing the desired level of accuracy.
Computational and statistical tradeoffs via convex relaxation
- Computer ScienceProceedings of the National Academy of Sciences
- 2013
This paper defines a notion of “algorithmic weakening,” in which a hierarchy of algorithms is ordered by both computational efficiency and statistical efficiency, allowing the growing strength of the data at scale to be traded off against the need for sophisticated processing.
High-dimensional change-point estimation: Combining filtering with convex optimization
- Computer Science2015 IEEE International Symposium on Information Theory (ISIT)
- 2015
The main result of this paper shows that the method performs change-point estimation reliably as long as the product of the smallest-sized change and the smallest distance between change-points is larger than a Gaussian width parameter that characterizes the low-dimensional complexity of the underlying signal sequence.
Automating model search for large scale machine learning
- Computer ScienceSoCC
- 2015
An architecture for automatic machine learning at scale comprised of a cost-based cluster resource allocation estimator, advanced hyper-parameter tuning techniques, bandit resource allocation via runtime algorithm introspection, and physical optimization via batching and optimal resource allocation is proposed.
References
SHOWING 1-10 OF 36 REFERENCES
FAST RATES FOR ESTIMATION ERROR AND ORACLE INEQUALITIES FOR MODEL SELECTION
- Computer ScienceEconometric Theory
- 2008
It is shown that, provided the sequence of models is ordered by inclusion, in these cases the authors can use tight upper bounds on estimation error as a complexity penalty, even in situations when the difference between the empirical risk and true risk decreases much more slowly than the complexity penalty.
Risk bounds for model selection via penalization
- Mathematics, Computer Science
- 1999
It is shown that the quadratic risk of the minimum penalized empirical contrast estimator is bounded by an index of the accuracy of the sieve, which quantifies the trade-off among the candidate models between the approximation error and parameter dimension relative to sample size.
Model Selection and Error Estimation
- Computer Science, MathematicsMachine Learning
- 2004
A tight relationship between error estimation and data-based complexity penalization is pointed out: any good error estimate may be converted into a data- based penalty function and the performance of the estimate is governed by the quality of the error estimate.
Complexity regularization via localized random penalties
- Computer Science, Mathematics
- 2004
This article proposes a new complexity-penalized model selection method based on data-dependent penalties, and considers the binary classification problem where, given a random observation X ∈ R d, one has to predict Y ∈ {0,1}.
Complexity Regularization with Application to Artificial Neural Networks
- Computer Science, Mathematics
- 1991
This paper defines general complexity regularization criteria and establishes bounds on the statistical risk of the estimated functions and establishes consistency, yield rates of convergence, and the near asymptotic optimality of the model selection criterion in both parametric and nonparametric cases.
Empirical minimization
- Mathematics
- 2006
We investigate the behavior of the empirical minimization algorithm using various methods. We first analyze it by comparing the empirical, random, structure and the original one on the class, either…
Robust Stochastic Approximation Approach to Stochastic Programming
- Computer Science, MathematicsSIAM J. Optim.
- 2009
It is intended to demonstrate that a properly modified SA approach can be competitive and even significantly outperform the SAA method for a certain class of convex stochastic problems.
Convexity, Classification, and Risk Bounds
- Computer Science
- 2006
A general quantitative relationship between the risk as assessed using the 0–1 loss and the riskAs assessed using any nonnegative surrogate loss function is provided, and it is shown that this relationship gives nontrivial upper bounds on excess risk under the weakest possible condition on the loss function.
Rademacher and Gaussian Complexities: Risk Bounds and Structural Results
- Computer ScienceJ. Mach. Learn. Res.
- 2001
This work investigates the use of certain data-dependent estimates of the complexity of a function class called Rademacher and Gaussian complexities and proves general risk bounds in terms of these complexities in a decision theoretic setting.
On the generalization ability of on-line learning algorithms
- Computer ScienceIEEE Transactions on Information Theory
- 2004
This paper proves tight data-dependent bounds for the risk of this hypothesis in terms of an easily computable statistic M/sub n/ associated with the on-line performance of the ensemble, and obtains risk tail bounds for kernel perceptron algorithms interms of the spectrum of the empirical kernel matrix.