Large Random Forests: Optimisation for Rapid Evaluation
@article{Gossen2019LargeRF, title={Large Random Forests: Optimisation for Rapid Evaluation}, author={Frederik Gossen and Bernhard Steffen}, journal={ArXiv}, year={2019}, volume={abs/1912.10934} }
Random Forests are one of the most popular classifiers in machine learning. The larger they are, the more precise is the outcome of their predictions. However, this comes at a cost: their running time for classification grows linearly with the number of trees, i.e. the size of the forest. In this paper, we propose a method to aggregate large Random Forests into a single, semantically equivalent decision diagram. Our experiments on various popular datasets show speed-ups of several orders of…
Figures and Tables from this paper
6 Citations
Algebraic aggregation of random forests: towards explainability and rapid evaluation
- Computer ScienceInternational Journal on Software Tools for Technology Transfer
- 2021
This paper proposes a method to aggregate large Random Forests into a single, semantically equivalent decision diagram which has the following two effects: (1) minimal, sufficient explanations for Random Forest-based classifications can be obtained by means of a simple three step reduction, and (2) the running time is radically improved.
Optimal Decision Diagrams for Classification
- Computer ScienceArXiv
- 2022
This work introduces a novel mixed-integer linear programming model for training and demonstrates its applicability for many datasets of practical importance and shows how this model can be easily extended for fairness, parsimony, and stability notions.
ADD-Lib: Decision Diagrams in Practice
- Computer ScienceArXiv
- 2019
The ADD-Lib is presented, an efficient and easy to use framework for Algebraic Decision Diagrams (ADDs) that provides as an artifact for cooperative experimentation the ability to combine a given Random Forest with their own ADDs regarded as expert knowledge and to experience the corresponding effect.
Aggressive Aggregation: a New Paradigm for Program Optimization
- Computer ScienceArXiv
- 2019
The technique supports loop unrolling as a first class optimization technique and is tailored to optimally aggregate large program fragments, especially those resulting from multiple loop unrollings, which results in a performance improvement beyond an order of magnitude.
Towards Automatic Data Cleansing and Classification of Valid Historical Data An Incremental Approach Based on MDD
- Computer Science2020 IEEE International Conference on Big Data (Big Data)
- 2020
This paper outlines the step-wise design of the finer granular digital format, aimed for storage and digital archiving, and the design and test of two generations of the techniques, used in the first two data ingestion and cleaning phases.
A Generative Approach for User-Centered, Collaborative, Domain-Specific Modeling Environments
- Computer ScienceArXiv
- 2021
The metatool Pyro demonstrated and analyzed here focuses on graph-based graphical languages to fully generate a complete, directly executable tool starting from a metamodel in order to meet all cross-cutting requirements.
26 References
Random Forests
- Computer ScienceMachine Learning
- 2004
Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Decision Jungles: Compact and Rich Models for Classification
- Computer ScienceNIPS
- 2013
This paper proposes decision jungles, revisiting the idea of ensembles of rooted decision directed acyclic graphs (DAGs), and shows these to be compact and powerful discriminative models for classification.
A Method to Merge Ensembles of Bagged or Boosted Forced-Split Decision Trees
- Computer Science, Environmental Science
A novel method to efficiently merge an ensemble or of forced-split decision trees into an “enlightened” decision tree is presented, and this paper introduces and test a “super-uniform sampling” technique which outperforms conventional uniform sampling in training individual trees.
Reducing Decision Tree Ensemble Size Using Parallel Decision DAGs
- Computer ScienceInt. J. Artif. Intell. Tools
- 2009
This research presents a new learning model, the Parallel Decision DAG (PDDAG), and shows how to use it to represent an ensemble of decision trees while using significantly less storage. Ensembles…
Forest Packing: Fast, Parallel Decision Forests
- Computer ScienceSDM
- 2019
Memory packing techniques that reorganize learned forests to minimize cache misses during classification increase the performance of classification in ensembles by a factor of four over an optimized C++ implementation and a actor of 50 over a popular R language implementation.
An Algebra to Merge Heterogeneous Classifiers
- Computer ScienceArXiv
- 2015
This work formally study the merging operation as an algebra, and proves that it satisfies a desirable set of properties and presents an approach for stationary distributions, such as homogeneous databases partitioned over different learners, which ensures that all models have the same impact.
A Random Forest Using a Multi-valued Decision Diagram on an FPGA
- Computer Science2017 IEEE 47th International Symposium on Multiple-Valued Logic (ISMVL)
- 2017
To accelerate the RF classification using the AOCL, the fully pipelined architecture is proposed to increase the memory bandwidth using on-chip memories on the FPGA and optimal precision fixed point representation is applied instead of 32 bit floating point one.
Embedding Decision Trees and Random Forests in Constraint Programming
- Computer ScienceCPAIOR
- 2015
This paper proposes three approaches based on converting a DT into a Multi-valued Decision Diagram, which is then fed to an mdd constraint, and shows how to embed in CP a Random Forest, a powerful type of ensemble classifier based on DTs.
Data Mining, Fourth Edition: Practical Machine Learning Tools and Techniques
- Computer Science
- 2016
This highly anticipated fourth edition of the most acclaimed work on data mining and machine learning teaches readers everything they need to know to get going, from preparing inputs, interpreting outputs, evaluating results, to the algorithmic methods at the heart of successful data mining approaches.
Induction of Decision Trees
- Computer ScienceMachine Learning
- 2004
This paper summarizes an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail, which is described in detail.