# Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives

@article{Dedieu2020LearningSC, title={Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives}, author={A. Dedieu and Hussein Hazimeh and R. Mazumder}, journal={ArXiv}, year={2020}, volume={abs/2001.06471} }

We consider a discrete optimization based approach for learning sparse classifiers, where the outcome depends upon a linear combination of a small subset of features. Recent work has shown that mixed integer programming (MIP) can be used to solve (to optimality) $\ell_0$-regularized problems at scales much larger than what was conventionally considered possible in the statistics and machine learning communities. Despite their usefulness, MIP-based approaches are significantly slower compared to… Expand

#### 13 Citations

Principal Component Hierarchy for Sparse Quadratic Programs

- Computer Science, Mathematics
- ICML
- 2021

A novel approximation hierarchy for cardinality-constrained, convex quadratic programs that exploits the rank-dominating eigenvectors of thequadratic matrix and proposes two scalable optimization algorithms that can efficiently screen the potential indices of the nonzero elements of the original program. Expand

Branch-and-bound Algorithm for Optimal Sparse Canonical Correlation Analysis

- 2021

Canonical correlation analysis (CCA) is a family of multivariate statistical methods for extracting mutual information contained in multiple datasets. To improve the interpretability of CCA, here we… Expand

Ideal formulations for constrained convex optimization problems with indicator variables

- Mathematics, Computer Science
- ArXiv
- 2020

This paper gives the convex hull description of the epigraph of the composition of a one-dimensional convex function and an affine function under arbitrary combinatorial constraints and gives a short proof that for a separable objective function, the perspective reformulation is ideal independent from the constraints of the problem. Expand

Safe Screening Rules for $\ell_0$-Regression.

- Computer Science, Mathematics
- 2020

Numerical experiments indicate that, on average, 76\% of the variables can be fixed to their optimal values, hence, reducing the computational burden for optimization substantially, and the proposed fast and effective screening rules extend the scope of algorithms for $\ell_0$-regression to larger data sets. Expand

Group selection and shrinkage with application to sparse semiparametric modeling

- Mathematics
- 2021

Sparse regression and classification estimators capable of group selection have application to an assortment of statistical problems, from multitask learning to sparse additive modeling to… Expand

Safe screening rules for L0-regression from Perspective Relaxations

- Computer Science
- ICML
- 2020

Numerical experiments indicate that a significant number of the variables can be removed quickly, hence reducing the computational burden for optimization substantially, and the proposed fast and effective screening rules extend the scope of algorithms for `0-regression to larger data sets. Expand

2x2 convexifications for convex quadratic optimization with indicator variables

- Mathematics
- 2020

In this paper, we study the convex quadratic optimization problem with indicator variables. For the bivariate case, we describe the convex hull of the epigraph in the original space of variables, and… Expand

Least Squares Estimation of a Monotone Quasiconvex Regression Function

- Mathematics
- 2020

We develop a new approach for the estimation of a multivariate function based on the economic axioms of monotonicity and quasiconvexity. We prove the existence of the nonparametric least squares… Expand

Sample-efficient L0-L2 constrained structure learning of sparse Ising models

- Computer Science, Mathematics
- AAAI
- 2021

This work leverage the cardinality constraint L0 norm, which is known to properly induce sparsity, and further combine it with an L2 norm to better model the non-zero coefficients, and shows that the proposed estimators achieve an improved sample complexity. Expand

Sparse optimization via vector k-norm and DC programming with an application to feature selection for Support Vector Machines

- 2021

Manlio Gaudioso Dipartimento di Ingegneria Informatica, Modellistica, Elettronica e Sistemistica, Università della Calabria, 87036 Rende (CS), Italy, manlio.gaudioso@unical.it Giovanni Giallombardo… Expand

#### References

SHOWING 1-10 OF 67 REFERENCES

Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms

- Computer Science, Mathematics
- Oper. Res.
- 2020

This paper empirically demonstrate that a family of L_0-based estimators can outperform the state-of-the-art sparse learning algorithms in terms of a combination of prediction, estimation, and variable selection metrics under various regimes (e.g., different signal strengths, feature correlations, number of samples and features). Expand

Sparse Classification and Phase Transitions: A Discrete Optimization Perspective

- Mathematics
- 2017

In this paper, we formulate the sparse classification problem of $n$ samples with $p$ features as a binary convex optimization problem and propose a cutting-plane algorithm to solve it exactly. For… Expand

Best Subset Selection via a Modern Optimization Lens

- Mathematics
- 2015

In the last twenty-five years (1990-2014), algorithmic advances in integer optimization combined with hardware improvements have resulted in an astonishing 200 billion factor speedup in solving Mixed… Expand

Proximal Algorithms

- Computer Science
- Found. Trends Optim.
- 2014

The many different interpretations of proximal operators and algorithms are discussed, their connections to many other topics in optimization and applied mathematics are described, some popular algorithms are surveyed, and a large number of examples of proxiesimal operators that commonly arise in practice are provided. Expand

Fast Newton Method for Sparse Logistic Regression

- Mathematics
- 2019

Sparse logistic regression has been developed tremendously in recent two decades, from its origination the $\ell_1$-regularized version by Tibshirani(1996) to the sparsity constrained models by… Expand

Iterative hard thresholding methods for $$l_0$$l0 regularized convex cone programming

- Mathematics, Computer Science
- Math. Program.
- 2014

An iterative hard thresholding (IHT) method and its variant for solving regularized box constrained convex programming and it is shown that the sequence generated by these methods converges to a local minimizer. Expand

Feature subset selection for logistic regression via mixed integer optimization

- Mathematics, Computer Science
- Comput. Optim. Appl.
- 2016

The computational results demonstrate that when the number of candidate features was less than 40, the method successfully provided a feature subset that was sufficiently close to an optimal one in a reasonable amount of time. Expand

Sparse high-dimensional regression: Exact scalable algorithms and phase transitions

- Mathematics
- 2017

We present a novel binary convex reformulation of the sparse regression problem that constitutes a new duality perspective. We devise a new cutting plane method and provide evidence that it can solve… Expand

Subset Selection with Shrinkage: Sparse Linear Modeling when the SNR is low

- Mathematics
- 2017

We study the behavior of a fundamental tool in sparse statistical modeling --the best-subset selection procedure (aka "best-subsets"). Assuming that the underlying linear model is sparse, it is well… Expand

Statistical Learning with Sparsity: The Lasso and Generalizations

- Computer Science
- 2015

Statistical Learning with Sparsity: The Lasso and Generalizations presents methods that exploit sparsity to help recover the underlying signal in a set of data and extract useful and reproducible patterns from big datasets. Expand