• Corpus ID: 209444537

Large Random Forests: Optimisation for Rapid Evaluation

  title={Large Random Forests: Optimisation for Rapid Evaluation},
  author={Frederik Gossen and Bernhard Steffen},
Random Forests are one of the most popular classifiers in machine learning. The larger they are, the more precise is the outcome of their predictions. However, this comes at a cost: their running time for classification grows linearly with the number of trees, i.e. the size of the forest. In this paper, we propose a method to aggregate large Random Forests into a single, semantically equivalent decision diagram. Our experiments on various popular datasets show speed-ups of several orders of… 

Figures and Tables from this paper

Algebraic aggregation of random forests: towards explainability and rapid evaluation

  • F. GossenB. Steffen
  • Computer Science
    International Journal on Software Tools for Technology Transfer
  • 2021
This paper proposes a method to aggregate large Random Forests into a single, semantically equivalent decision diagram which has the following two effects: (1) minimal, sufficient explanations for Random Forest-based classifications can be obtained by means of a simple three step reduction, and (2) the running time is radically improved.

Optimal Decision Diagrams for Classification

This work introduces a novel mixed-integer linear programming model for training and demonstrates its applicability for many datasets of practical importance and shows how this model can be easily extended for fairness, parsimony, and stability notions.

ADD-Lib: Decision Diagrams in Practice

The ADD-Lib is presented, an efficient and easy to use framework for Algebraic Decision Diagrams (ADDs) that provides as an artifact for cooperative experimentation the ability to combine a given Random Forest with their own ADDs regarded as expert knowledge and to experience the corresponding effect.

Aggressive Aggregation: a New Paradigm for Program Optimization

The technique supports loop unrolling as a first class optimization technique and is tailored to optimally aggregate large program fragments, especially those resulting from multiple loop unrollings, which results in a performance improvement beyond an order of magnitude.

Towards Automatic Data Cleansing and Classification of Valid Historical Data An Incremental Approach Based on MDD

This paper outlines the step-wise design of the finer granular digital format, aimed for storage and digital archiving, and the design and test of two generations of the techniques, used in the first two data ingestion and cleaning phases.

A Generative Approach for User-Centered, Collaborative, Domain-Specific Modeling Environments

The metatool Pyro demonstrated and analyzed here focuses on graph-based graphical languages to fully generate a complete, directly executable tool starting from a metamodel in order to meet all cross-cutting requirements.

Random Forests

Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.

Decision Jungles: Compact and Rich Models for Classification

This paper proposes decision jungles, revisiting the idea of ensembles of rooted decision directed acyclic graphs (DAGs), and shows these to be compact and powerful discriminative models for classification.

A Method to Merge Ensembles of Bagged or Boosted Forced-Split Decision Trees

A novel method to efficiently merge an ensemble or of forced-split decision trees into an “enlightened” decision tree is presented, and this paper introduces and test a “super-uniform sampling” technique which outperforms conventional uniform sampling in training individual trees.

Reducing Decision Tree Ensemble Size Using Parallel Decision DAGs

This research presents a new learning model, the Parallel Decision DAG (PDDAG), and shows how to use it to represent an ensemble of decision trees while using significantly less storage. Ensembles

Forest Packing: Fast, Parallel Decision Forests

Memory packing techniques that reorganize learned forests to minimize cache misses during classification increase the performance of classification in ensembles by a factor of four over an optimized C++ implementation and a actor of 50 over a popular R language implementation.

An Algebra to Merge Heterogeneous Classifiers

This work formally study the merging operation as an algebra, and proves that it satisfies a desirable set of properties and presents an approach for stationary distributions, such as homogeneous databases partitioned over different learners, which ensures that all models have the same impact.

A Random Forest Using a Multi-valued Decision Diagram on an FPGA

To accelerate the RF classification using the AOCL, the fully pipelined architecture is proposed to increase the memory bandwidth using on-chip memories on the FPGA and optimal precision fixed point representation is applied instead of 32 bit floating point one.

Embedding Decision Trees and Random Forests in Constraint Programming

This paper proposes three approaches based on converting a DT into a Multi-valued Decision Diagram, which is then fed to an mdd constraint, and shows how to embed in CP a Random Forest, a powerful type of ensemble classifier based on DTs.

Data Mining, Fourth Edition: Practical Machine Learning Tools and Techniques

This highly anticipated fourth edition of the most acclaimed work on data mining and machine learning teaches readers everything they need to know to get going, from preparing inputs, interpreting outputs, evaluating results, to the algorithmic methods at the heart of successful data mining approaches.

Induction of Decision Trees

This paper summarizes an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail, which is described in detail.