Corpus ID: 12061478

On Computationally Tractable Selection of Experiments in Measurement-Constrained Regression Models

  title={On Computationally Tractable Selection of Experiments in Measurement-Constrained Regression Models},
  author={Yining Wang and Adams Wei Yu and Aarti Singh},
  journal={J. Mach. Learn. Res.},
We derive computationally tractable methods to select a small subset of experiment settings from a large pool of given design points. The primary focus is on linear regression models, while the technique extends to generalized linear models and Delta's method (estimating functions of linear regression models) as well. The algorithms are based on a continuous relaxation of an otherwise intractable combinatorial optimization problem, with sampling or greedy procedures as post-processing steps… Expand
Optimal Sampling for Generalized Linear Models Under Measurement Constraints
Abstract Under “measurement constraints,” responses are expensive to measure and initially unavailable on most of records in the dataset, but the covariates are available for the entire dataset. OurExpand
LowCon: A Design-based Subsampling Approach in a Misspecified Linear Model
A novel subsampling method is developed, called "LowCon", which outperforms the competing methods when the working linear model is misspecified and approximately minimizes the so-called "worst-case" bias with respect to many possible misspecification terms. Expand
Minimax experimental design: Bridging the gap between statistical and worst-case approaches to least squares regression
This work motivates a new minimax-optimality criterion for experimental design which can be viewed as an extension of both A-optimal design and sampling for worst-case regression, and develops a new algorithm for a joint sampling distribution called volume sampling. Expand
Refined bounds for randomized experimental design
A new concentration inequality for the eigenvalues of random matrices is developed using a refined version of the intrinsic dimension that enables us to quantify the performance of such randomized strategies on E and G-optimal design. Expand
Submodular Observation Selection and Information Gathering for Quadratic Models
An efficient greedy observation selection algorithm uniquely tailored for quadratic models is developed, theoretical bounds on its achievable utility are provided, and monotone and (weak) submodular set functions are shown. Expand
Modern Subsampling Methods for Large-Scale Least Squares Regression
  • Tao Li, Cheng Meng
  • Computer Science, Mathematics
  • 2020
This review presents some cutting-edge subsampling methods based on the large-scale least squares estimation that aim to develop a more effective data-dependent sampling probability and a deterministic subsample in accordance with certain optimality criteria. Expand
Near-Optimal Discrete Optimization for Experimental Design
In this paper we consider computationally tractable methods for discrete optimization in experimental design, an important question in data mining and analysis when labeled data are expensive andExpand
Unbiased estimators for random design regression
It is shown that for any input distribution and $\epsilon>0$ there is a random design consisting of O(d\log d+ d/\ep silon)$ points from which an unbiased estimator can be constructed whose square loss over the entire distribution is with high probability bounded by $1+\epsil on times the loss of the optimum. Expand
Bayesian experimental design using regularized determinantal point processes
A new fundamental connection between Bayesian experimental design and determinantal point processes is demonstrated, which is used to develop new efficient algorithms for finding optimal designs under four optimality criteria: A, C, D and V. Expand
Near-optimal discrete optimization for experimental design: a regret minimization approach
A polynomial-time regret minimization framework is proposed to achieve a 1-1 + ε approximation with only O(p/\varepsilon ^2) design points, for all the optimality criteria above. Expand


Random Design Analysis of Ridge Regression
This work gives a simultaneous analysis of both the ordinary least squares estimator and the ridge regression estimator in the random design setting under mild assumptions on the covariate/responseExpand
Optimal Subsampling Approaches for Large Sample Linear Regression
A significant hurdle for analyzing large sample data is the lack of effective statistical computing and inference methods. An emerging powerful approach for analyzing large sample data isExpand
Active Regression by Stratification
This is the first active learner for this setting that provably can improve over passive learning and provides finite sample convergence guarantees for general distributions in the misspecified model. Expand
A statistical perspective on algorithmic leveraging
This work provides an effective framework to evaluate the statistical properties of algorithmic leveraging in the context of estimating parameters in a linear regression model and shows that from the statistical perspective of bias and variance, neither leverage-based sampling nor uniform sampling dominates the other. Expand
Hard-Margin Active Linear Regression
It is shown that active learning admits significantly better sample complexity bounds than the passive learning counterpart, and give efficient algorithms that attain near-optimal bounds. Expand
Regression Shrinkage and Selection via the Lasso
SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than aExpand
We show that, under a sparsity scenario, the Lasso estimator and the Dantzig selector exhibit similar behavior. For both methods, we derive, in parallel, oracle inequalities for the prediction riskExpand
Fast Randomized Kernel Ridge Regression with Statistical Guarantees
A version of this approach that comes with running time guarantees as well as improved guarantees on its statistical performance is described, and a fast algorithm is presented to quickly compute coarse approximations to these scores in time linear in the number of samples. Expand
Design Issues for Generalized Linear Models: A Review
Generalized linear models (GLMs) have been used quite effectively in the modeling of a mean response under nonstandard conditions, where discrete as well as continuous data distributions can beExpand
Convergence Rates of Active Learning for Maximum Likelihood Estimation
This paper provides an upper bound on the label requirement of the algorithm, and a lower bound that matches it up to lower order terms, and shows that unlike binary classification in the realizable case, just a single extra round of interaction is sufficient to achieve near-optimal performance in maximum likelihood estimation. Expand