• Corpus ID: 12204193

Oracle inequalities for computationally adaptive model selection

  title={Oracle inequalities for computationally adaptive model selection},
  author={Alekh Agarwal and Peter L. Bartlett and John C. Duchi},
We analyze general model selection procedures using penalized empirical loss minimization under computational constraints. While classical model selection approaches do not consider computational aspects of performing model selection, we argue that any practical model selection procedure must not only trade off estimation and approximation error, but also the computational effort required to compute empirical minimizers for different function classes. We provide a framework for analyzing such… 

Figures from this paper

Designing Statistical Estimators That Balance Sample Size, Risk, and Computational Cost
This work uses regularized linear regression as a case study to argue for the existence of a tradeoff between computational time, sample complexity, and statistical accuracy that applies to statistical estimators based on convex optimization.
Non-stochastic Best Arm Identification and Hyperparameter Optimization
This work casts hyperparameter optimization as an instance of non-stochastic best-arm identification, identifies a known algorithm that is well-suited for this setting, and empirically evaluates its behavior.
Generalized Rank-Breaking: Computational and Statistical Tradeoffs
To compute the pseudo-maximum likelihood estimate of the set-wise comparisons, this work provides a generalization of the minorization maximization algorithm and gives guarantees on its convergence.
Multiple Optimality Guarantees in Statistical Learning
This thesis considers the fundamental questions that arise when trading between multiple such criteria--computation, communication, privacy--while maintaining statistical performance, and proves fundamental lower bounds on the statistical performance of any algorithm subject to the constraints ofcomputational, confidentiality, or communication.
Computation-Risk Tradeoffs for Covariance-Thresholded Regression
This analysis shows how the risk of this family of linear regression estimators varies with the sparsity and regularization level, thus establishing a statistical estimation setting for which there is an explicit, smooth tradeoff between risk and computation.
Computational and Statistical Tradeoffs in Learning to Rank
In the application of learning to rank, this work provides a hierarchy of rank-breaking mechanisms ordered by the complexity in thus generated sketch of the data that allows the number of data points collected to be gracefully traded off against computational resources available, while guaranteeing the desired level of accuracy.
Computational and statistical tradeoffs via convex relaxation
This paper defines a notion of “algorithmic weakening,” in which a hierarchy of algorithms is ordered by both computational efficiency and statistical efficiency, allowing the growing strength of the data at scale to be traded off against the need for sophisticated processing.
High-dimensional change-point estimation: Combining filtering with convex optimization
The main result of this paper shows that the method performs change-point estimation reliably as long as the product of the smallest-sized change and the smallest distance between change-points is larger than a Gaussian width parameter that characterizes the low-dimensional complexity of the underlying signal sequence.
Automating model search for large scale machine learning
An architecture for automatic machine learning at scale comprised of a cost-based cluster resource allocation estimator, advanced hyper-parameter tuning techniques, bandit resource allocation via runtime algorithm introspection, and physical optimization via batching and optimal resource allocation is proposed.
Time – Data Tradeo ff s by Aggressive Smoothing
This work provides theoretical and experimental evidence of a tradeoff between sample complexity and computation time that applies to statistical estimators based on convex optimization for a class of regularized linear inverse problems.


Risk bounds for model selection via penalization
It is shown that the quadratic risk of the minimum penalized empirical contrast estimator is bounded by an index of the accuracy of the sieve, which quantifies the trade-off among the candidate models between the approximation error and parameter dimension relative to sample size.
Complexity regularization via localized random penalties
This article proposes a new complexity-penalized model selection method based on data-dependent penalties, and considers the binary classification problem where, given a random observation X ∈ R d, one has to predict Y ∈ {0,1}.
Empirical minimization
We investigate the behavior of the empirical minimization algorithm using various methods. We first analyze it by comparing the empirical, random, structure and the original one on the class, either
Robust Stochastic Approximation Approach to Stochastic Programming
It is intended to demonstrate that a properly modified SA approach can be competitive and even significantly outperform the SAA method for a certain class of convex stochastic problems.
Convexity, Classification, and Risk Bounds
A general quantitative relationship between the risk as assessed using the 0–1 loss and the riskAs assessed using any nonnegative surrogate loss function is provided, and it is shown that this relationship gives nontrivial upper bounds on excess risk under the weakest possible condition on the loss function.
Rademacher and Gaussian Complexities: Risk Bounds and Structural Results
This work investigates the use of certain data-dependent estimates of the complexity of a function class called Rademacher and Gaussian complexities and proves general risk bounds in terms of these complexities in a decision theoretic setting.
Nonparametric Maximum Likelihood Estimation by the Method of Sieves
Maximum likelihood estimation often fails when the parameter takes values in an infinite dimensional space. For example, the maximum likelihood method cannot be applied to the completely
Finite-time Analysis of the Multiarmed Bandit Problem
This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.
A Stochastic Approximation Method
Let M(x) denote the expected value at level x of the response to a certain experiment. M(x) is assumed to be a monotone function of x but is unknown tot he experiment, and it is desire to find the
Local Rademacher complexities
New bounds on the error of learning algorithms in terms of a data-dependent notion of complexity are proposed and some applications to classification and prediction with convex function classes, and with kernel classes in particular are presented.