Logistic Regression: From Art to Science

  title={Logistic Regression: From Art to Science},
  author={Dimitris Bertsimas and Angela King},
  journal={Statistical Science},
A high quality logistic regression model contains various desirable properties: predictive power, interpretability, significance, robustness to error in data and sparsity, among others. [] Key Method The resulting MINLO is flexible and can be adjusted based on the needs of the modeler. Using both real and synthetic data, we demonstrate that the overall approach is generally applicable and provides high quality solutions in realistic timelines as well as a guarantee of suboptimality. When the MINLO is…
Robust Variable Selection with Optimality Guarantees for High-Dimensional Logistic Regression
This study proposes a robust and sparse estimator for logistic regression models, which simultaneously tackles the presence of outliers and/or irrelevant features, and compares its effectiveness with existing heuristic methods and non-robust procedures.
Sparse Poisson regression via mixed-integer optimization
This paper derives a mixed-integer quadratic optimization (MIQO) formulation for sparse Poisson regression that maximizes the weighted sum of the log-likelihood function and the L2-regularization term and proposes two methods for selecting a limited number of tangent lines effective for piecewise-linear approximations.
Sparse Regression: Scalable Algorithms and Empirical Performance
Accuracy, false detection and computational time provide a comprehensive assessment of each feature selection method and shed light on alternatives to the Lasso-regularization which are not as popular in practice yet.
Efficient and Effective $L_0$ Feature Selection
Computational viability and improved performance in subtler scenarios is achieved with a multi-pronged blueprint, leveraging characteristics of the Mixed Integer Programming framework and by means of whitening, a data pre-processing step.
Best Subset, Forward Stepwise or Lasso? Analysis and Recommendations Based on Extensive Comparisons
An expanded set of simulations is presented to shed more light on empirical comparisons of best subset with other popular variable selection procedures, in particular, the lasso and forward stepwise selection, suggesting that best subset consistently outperformed both methods in terms of prediction accuracy.
  • K. Kimura
  • Computer Science
    Journal of the Operations Research Society of Japan
  • 2019
A mixed integer nonlinear programming approach to AIC minimization for linear regression and it is shown that the approach outperformed existing approaches in terms of computational time and piecewise linear approximation approach.
Mixed-integer quadratic programming reformulations of multi-task learning models
This manuscript considers well-known multi-task learning models from the literature for linear regression problems, such as clustered MTL or weakly constrained MTL, and proposes novel reformulations of the training problem, based on mixed-integer quadratic programming (MIQP) techniques.
MIP-BOOST: Efficient and Effective L 0 Feature Selection for Linear Regression
MIP-BOOST is proposed, a revision of standard mixed integer programming feature selection that reduces the computational burden of tuning the critical sparsity bound parameter and improves performance in the presence of feature collinearity and of signals that vary in nature and strength.
An effective procedure for feature subset selection in logistic regression based on information criteria
This paper proposes a new approach, which combines mixed-integer programming and decomposition techniques in order to overcome the aforementioned scalability issues, and provides a theoretical characterization of the proposed algorithm properties.
Learning Structure in Nested Logit Models
This paper forms the problem of learning an optimal nesting structure from the data as a mixed integer nonlinear programming (MINLP) optimization problem and solves it using a variant of the linear outer approximation algorithm.


Best Subset Selection via a Modern Optimization Lens
It is established via numerical experiments that the MIO approach performs better than {\texttt {Lasso}} and other popularly used sparse learning procedures, in terms of achieving sparse solutions with good predictive power.
Feature subset selection for logistic regression via mixed integer optimization
The computational results demonstrate that when the number of candidate features was less than 40, the method successfully provided a feature subset that was sufficiently close to an optimal one in a reasonable amount of time.
The composite absolute penalties family for grouped and hierarchical variable selection
CAP is shown to improve on the predictive performance of the LASSO in a series of simulated experiments, including cases with $p\gg n$ and possibly mis-specified groupings, and iCAP is seen to be parsimonious in the experiments.
An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression
This paper describes an efficient interior-point method for solving large-scale l1-regularized logistic regression problems, and shows how a good approximation of the entire regularization path can be computed much more efficiently than by solving a family of problems independently.
The group lasso for logistic regression
An efficient algorithm is presented, that is especially suitable for high dimensional problems, which can also be applied to generalized linear models to solve the corresponding convex optimization problem.
Sparse multinomial logistic regression: fast algorithms and generalization bounds
This paper introduces a true multiclass formulation based on multinomial logistic regression and derives fast exact algorithms for learning sparse multiclass classifiers that scale favorably in both the number of training samples and the feature dimensionality, making them applicable even to large data sets in high-dimensional feature spaces.
Distributionally Robust Logistic Regression
This paper uses the Wasserstein distance to construct a ball in the space of probability distributions centered at the uniform distribution on the training samples, and proposes a distributionally robust logistic regression model that minimizes a worst-case expected logloss function.
Best subsets logistic regression
The purpose of this note is to illustrate that for one of the more frequently used nonnormal regression models, logistic regression, one may perform the Lawless-Singhal analysis with any best subsets linear regression program that allows for case weights.
Efficient L1 Regularized Logistic Regression
Theoretical results show that the proposed efficient algorithm for L1 regularized logistic regression is guaranteed to converge to the global optimum, and experiments show that it significantly outperforms standard algorithms for solving convex optimization problems.
A Sparse-Group Lasso
A regularized model for linear regression with ℓ1 andℓ2 penalties is introduced and it is shown that it has the desired effect of group-wise and within group sparsity.