Corpus ID: 210714077

Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives

  title={Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives},
  author={A. Dedieu and Hussein Hazimeh and R. Mazumder},
We consider a discrete optimization based approach for learning sparse classifiers, where the outcome depends upon a linear combination of a small subset of features. Recent work has shown that mixed integer programming (MIP) can be used to solve (to optimality) $\ell_0$-regularized problems at scales much larger than what was conventionally considered possible in the statistics and machine learning communities. Despite their usefulness, MIP-based approaches are significantly slower compared to… Expand

Figures and Tables from this paper

Principal Component Hierarchy for Sparse Quadratic Programs
A novel approximation hierarchy for cardinality-constrained, convex quadratic programs that exploits the rank-dominating eigenvectors of thequadratic matrix and proposes two scalable optimization algorithms that can efficiently screen the potential indices of the nonzero elements of the original program. Expand
Branch-and-bound Algorithm for Optimal Sparse Canonical Correlation Analysis
Canonical correlation analysis (CCA) is a family of multivariate statistical methods for extracting mutual information contained in multiple datasets. To improve the interpretability of CCA, here weExpand
Ideal formulations for constrained convex optimization problems with indicator variables
This paper gives the convex hull description of the epigraph of the composition of a one-dimensional convex function and an affine function under arbitrary combinatorial constraints and gives a short proof that for a separable objective function, the perspective reformulation is ideal independent from the constraints of the problem. Expand
Safe Screening Rules for $\ell_0$-Regression.
Numerical experiments indicate that, on average, 76\% of the variables can be fixed to their optimal values, hence, reducing the computational burden for optimization substantially, and the proposed fast and effective screening rules extend the scope of algorithms for $\ell_0$-regression to larger data sets. Expand
Group selection and shrinkage with application to sparse semiparametric modeling
Sparse regression and classification estimators capable of group selection have application to an assortment of statistical problems, from multitask learning to sparse additive modeling toExpand
Safe screening rules for L0-regression from Perspective Relaxations
Numerical experiments indicate that a significant number of the variables can be removed quickly, hence reducing the computational burden for optimization substantially, and the proposed fast and effective screening rules extend the scope of algorithms for `0-regression to larger data sets. Expand
2x2 convexifications for convex quadratic optimization with indicator variables
In this paper, we study the convex quadratic optimization problem with indicator variables. For the bivariate case, we describe the convex hull of the epigraph in the original space of variables, andExpand
Least Squares Estimation of a Monotone Quasiconvex Regression Function
We develop a new approach for the estimation of a multivariate function based on the economic axioms of monotonicity and quasiconvexity. We prove the existence of the nonparametric least squaresExpand
Sample-efficient L0-L2 constrained structure learning of sparse Ising models
This work leverage the cardinality constraint L0 norm, which is known to properly induce sparsity, and further combine it with an L2 norm to better model the non-zero coefficients, and shows that the proposed estimators achieve an improved sample complexity. Expand
Sparse optimization via vector k-norm and DC programming with an application to feature selection for Support Vector Machines
Manlio Gaudioso Dipartimento di Ingegneria Informatica, Modellistica, Elettronica e Sistemistica, Università della Calabria, 87036 Rende (CS), Italy, Giovanni GiallombardoExpand


Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms
This paper empirically demonstrate that a family of L_0-based estimators can outperform the state-of-the-art sparse learning algorithms in terms of a combination of prediction, estimation, and variable selection metrics under various regimes (e.g., different signal strengths, feature correlations, number of samples and features). Expand
Sparse Classification and Phase Transitions: A Discrete Optimization Perspective
In this paper, we formulate the sparse classification problem of $n$ samples with $p$ features as a binary convex optimization problem and propose a cutting-plane algorithm to solve it exactly. ForExpand
Best Subset Selection via a Modern Optimization Lens
In the last twenty-five years (1990-2014), algorithmic advances in integer optimization combined with hardware improvements have resulted in an astonishing 200 billion factor speedup in solving MixedExpand
Proximal Algorithms
The many different interpretations of proximal operators and algorithms are discussed, their connections to many other topics in optimization and applied mathematics are described, some popular algorithms are surveyed, and a large number of examples of proxiesimal operators that commonly arise in practice are provided. Expand
Fast Newton Method for Sparse Logistic Regression
Sparse logistic regression has been developed tremendously in recent two decades, from its origination the $\ell_1$-regularized version by Tibshirani(1996) to the sparsity constrained models byExpand
Iterative hard thresholding methods for $$l_0$$l0 regularized convex cone programming
  • Zhaosong Lu
  • Mathematics, Computer Science
  • Math. Program.
  • 2014
An iterative hard thresholding (IHT) method and its variant for solving regularized box constrained convex programming and it is shown that the sequence generated by these methods converges to a local minimizer. Expand
Feature subset selection for logistic regression via mixed integer optimization
The computational results demonstrate that when the number of candidate features was less than 40, the method successfully provided a feature subset that was sufficiently close to an optimal one in a reasonable amount of time. Expand
Sparse high-dimensional regression: Exact scalable algorithms and phase transitions
We present a novel binary convex reformulation of the sparse regression problem that constitutes a new duality perspective. We devise a new cutting plane method and provide evidence that it can solveExpand
Subset Selection with Shrinkage: Sparse Linear Modeling when the SNR is low
We study the behavior of a fundamental tool in sparse statistical modeling --the best-subset selection procedure (aka "best-subsets"). Assuming that the underlying linear model is sparse, it is wellExpand
Statistical Learning with Sparsity: The Lasso and Generalizations
Statistical Learning with Sparsity: The Lasso and Generalizations presents methods that exploit sparsity to help recover the underlying signal in a set of data and extract useful and reproducible patterns from big datasets. Expand