Optimal randomized classification trees

@article{Blanquero2021OptimalRC,
  title={Optimal randomized classification trees},
  author={Rafael Blanquero and Emilio Carrizosa and Cristina Molero-R{\'i}o and Dolores Romero Morales},
  journal={Comput. Oper. Res.},
  year={2021},
  volume={132},
  pages={105281}
}
Strong Optimal Classification Trees
TLDR
This paper proposes an intuitive flow-based MIO formulation that can accommodate side constraints to enable the design of interpretable and fair decision trees and shows that this formulation has a stronger linear optimization relaxation than existing methods.
Generalized Optimal Sparse Decision Trees
TLDR
The contribution in this work is to provide a general framework for decision tree optimization that addresses the two significant open problems in the area: treatment of imbalanced data and fully optimizing over continuous variables.
On multivariate randomized classification trees: l0-based sparsity, VC dimension and decomposition methods
TLDR
This work investigates the nonlinear continuous optimization formulation proposed in Blanquero et al. (2020) for (sparse) optimal randomized classification trees and proposes a general decomposition scheme and an efficient version of it.
Optimal Sparse Decision Trees
TLDR
This work introduces the first practical algorithm for optimal decision trees for binary variables, a co-design of analytical bounds that reduce the search space and modern systems techniques, including data structures and a custom bit-vector library.
Mathematical optimization in classification and regression trees
TLDR
It is illustrated how these powerful formulations enhance the flexibility of tree models, being better suited to incorporate desirable properties such as cost-sensitivity, explainability, and fairness, and to deal with complex data, such as functional data.
Constrained Naïve Bayes with application to unbalanced data classification
TLDR
This paper addresses the issue of misclassification for the Naive Bayes classifier by adding constraints to the optimization problem underlying the estimation process, and shows that under a reasonable computational cost, the performance measures under consideration achieve the desired levels yielding a user-friendly constrained classification procedure.
Multiclass Optimal Classification Trees with SVM-splits
TLDR
A novel mathematical optimizationbased methodology to construct tree-shaped classification rules for multiclass instances by means of a SVM separating hyperplane, which provides a Mixed Integer Non Linear Programming formulation for the problem.
Coresets for Decision Trees of Signals
TLDR
Experimental results on sklearn and lightGBM show that applying coresets on real-world data-sets boosts the computation time of random forests and their parameter tuning by up to x10, while keeping similar accuracy.
...
1
2
...

References

SHOWING 1-10 OF 59 REFERENCES
Optimal classification trees
TLDR
Optimal classification trees are presented, a novel formulation of the decision tree problem using modern MIO techniques that yields the optimal decision tree for axes-aligned splits and synthetic tests demonstrate that these methods recover the true decision tree more closely than heuristics, refuting the notion that optimal methods overfit the training data.
Efficient Non-greedy Optimization of Decision Trees
TLDR
It is shown that the problem of finding optimal linear-combination splits for decision trees is related to structured prediction with latent variables, and a convex-concave upper bound on the tree's empirical loss is formed, and the use of stochastic gradient descent for optimization enables effective training with large datasets.
Optimal trees for prediction and prescription
TLDR
Highperformance local search methods are developed that allow us to efficiently solve the problem of constructing the optimal decision tree using discrete optimization, allowing us to construct the entire decision tree in a single step and hence find the single tree that best minimizes the training error.
Classification and Regression by randomForest
TLDR
random forests are proposed, which add an additional layer of randomness to bagging and are robust against overfitting, and the randomForest package provides an R interface to the Fortran programs by Breiman and Cutler.
Multivariate classification trees based on minimum features discrete support vector machines
TLDR
An algorithm for generating decision trees in which multivariate splitting rules are based on the new concept of discrete support vector machines, which consistently outperforms other classification approaches in terms of accuracy, and is therefore capable of good generalization on future unseen data.
Optimal decision trees for categorical data via integer programming
TLDR
A mixed integer programming formulation to construct optimal decision trees of a prespecified size that takes the special structure of categorical features into account and allow combinatorial decisions (based on subsets of values of features) at each node.
Classification and regression trees
  • W. Loh
  • Computer Science
    WIREs Data Mining Knowl. Discov.
  • 2011
TLDR
This article gives an introduction to the subject of classification and regression trees by reviewing some widely available algorithms and comparing their capabilities, strengths, and weakness in two examples.
Do we need hundreds of classifiers to solve real world classification problems?
TLDR
The random forest is clearly the best family of classifiers (3 out of 5 bests classifiers are RF), followed by SVM (4 classifiers in the top-10), neural networks and boosting ensembles (5 and 3 members in theTop-20, respectively).
Optimal Decision Trees
TLDR
An Extreme Point Tabu Search (EPTS) algorithm that constructs globally optimal decision trees for classiication problems is proposed and it is shown that it is suucient to restrict the search to the extreme points of the polyhedral region.
Fast growing and interpretable oblique trees via logistic regression models
TLDR
The focus of this thesis is to grow oblique trees in a fast and deterministic manner and to propose ways of making them more interpretable, as the proposed approach to finding oblique splits makes use of logistic regression, well-founded variable selection techniques are introduced to classification trees.
...
1
2
3
4
5
...