Corpus ID: 3087908

Optimization of Tree Ensembles

@inproceedings{Mivsic2017OptimizationOT,
  title={Optimization of Tree Ensembles},
  author={Velibor V. Mivsi'c},
  year={2017}
}
Tree ensemble models such as random forests and boosted trees are among the most widely used and practically successful predictive models in applied machine learning and business analytics. Although such models have been used to make predictions based on exogenous, uncontrollable independent variables, they are increasingly being used to make predictions where the independent variables are controllable and are also decision variables. In this paper, we study the problem of tree ensemble… Expand
ENTMOOT: A Framework for Optimization over Ensemble Tree Models
TLDR
This work shows how the ENTMOOT approach allows a simple integration of tree models into decision-making and black-box optimization, where it proves as a strong competitor to commonly-used frameworks. Expand
Mathematical optimization in classification and regression trees
Classification and regression trees, as well as their variants, are off-the-shelf methods in Machine Learning. In this paper, we review recent contributions within the Continuous Optimization and theExpand
Learning Optimal Classification Trees: Strong Max-Flow Formulations
TLDR
This work proposes a flow-based MIP formulation for optimal binary classification trees that has a stronger linear programming relaxation and exploits the structure and max-flow/min-cut duality to derive a Benders' decomposition method, which scales to larger instances. Expand
The Use of Binary Choice Forests to Model and Estimate Discrete Choices
We show the equivalence of discrete choice models and the class of binary choice forests, which are random forests based on binary choice trees. This suggests that standard machine learningExpand
JANOS: An Integrated Predictive and Prescriptive Modeling Framework
TLDR
A modeling framework JANOS is described that seamlessly integrates the two streams of analytics, for the first time allowing researchers and practitioners to embed machine learning models in an optimization framework. Expand
Rare-Event Simulation for Neural Network and Random Forest Predictors
TLDR
An importance sampling scheme is investigated that integrates the dominating point machinery in large deviations and sequential mixed integer programming to locate the underlying dominating points in rare-event simulation. Expand
Decision Forest: A Nonparametric Approach to Modeling Irrational Choice
TLDR
A new nonparametric choice model that relaxes this assumption and can model a wider range of customer behavior, such as decoy effects between products, is proposed and outperforms both rational and non-rational benchmark models in out-of-sample predictive ability. Expand
Exact Logit-Based Product Design
The share-of-choice product design (SOCPD) problem is to find the product, as defined by its attributes, that maximizes market share arising from a collection of customer types or segments. WhenExpand
Exact Logit-Based Product Design
The share-of-choice product design (SOCPD) problem is to find the product, as defined by its attributes, that maximizes market share arising from a collection of customer types or segments. WhenExpand
BooST: Boosting Smooth Trees for Partial Effect Estimation in Nonlinear Regressions
TLDR
A new machine learning model for nonlinear regression called the Boosted Smooth Transition Regression Trees (BooST), which is a combination of boosting algorithms with smooth transition regression trees, which can provide more interpretation about the mapping between the covariates and the dependent variable than other tree-based models. Expand
...
1
2
...

References

SHOWING 1-10 OF 54 REFERENCES
Optimal classification trees
TLDR
Optimal classification trees are presented, a novel formulation of the decision tree problem using modern MIO techniques that yields the optimal decision tree for axes-aligned splits and synthetic tests demonstrate that these methods recover the true decision tree more closely than heuristics, refuting the notion that optimal methods overfit the training data. Expand
Adaptive Concentration of Regression Trees, with Application to Random Forests
We study the convergence of the predictive surface of regression trees and forests. To support our analysis we introduce a notion of adaptive concentration for regression trees. This approach breaksExpand
A working guide to boosted regression trees.
TLDR
This study provides a working guide to boosted regression trees (BRT), an ensemble method for fitting statistical models that differs fundamentally from conventional techniques that aim to fit a single parsimonious model. Expand
Do we need hundreds of classifiers to solve real world classification problems?
TLDR
The random forest is clearly the best family of classifiers (3 out of 5 bests classifiers are RF), followed by SVM (4 classifiers in the top-10), neural networks and boosting ensembles (5 and 3 members in theTop-20, respectively). Expand
Classification and Regression by randomForest
TLDR
random forests are proposed, which add an additional layer of randomness to bagging and are robust against overfitting, and the randomForest package provides an R interface to the Fortran programs by Breiman and Cutler. Expand
Consistency of Random Forests
Random forests are a learning algorithm proposed by Breiman [Mach. Learn. 45 (2001) 5–32] that combines several randomized decision trees and aggregates their predictions by averaging. Despite itsExpand
Best Subset Selection via a Modern Optimization Lens
In the last twenty-five years (1990-2014), algorithmic advances in integer optimization combined with hardware improvements have resulted in an astonishing 200 billion factor speedup in solving MixedExpand
The Power and Limits of Predictive Approaches to Observational-Data-Driven Optimization
While data-driven decision-making is transforming modern operations, most large-scale data is of an observational nature, such as transactional records. These data pose unique challenges in a varietyExpand
OR Forum - An Algorithmic Approach to Linear Regression
TLDR
This work presents an MIQO-based approach for designing high quality linear regression models that explicitly addresses various competing objectives and demonstrates the effectiveness of the approach on both real and synthetic data sets. Expand
Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling
TLDR
It is the combination of relatively high prediction accuracy and its collection of desired features that makes Random Forest uniquely suited for modeling in cheminformatics. Expand
...
1
2
3
4
5
...