• Corpus ID: 226300136

PACSET (Packed Serialized Trees): Reducing Inference Latency for Tree Ensemble Deployment

  title={PACSET (Packed Serialized Trees): Reducing Inference Latency for Tree Ensemble Deployment},
  author={Meghana Madhyastha and Kunal Lillaney and James Browne and Joshua T. Vogelstein and Randal C. Burns},
We present methods to serialize and deserialize tree ensembles that optimize inference latency when models are not already loaded into memory. This arises whenever models are larger than memory, but also systematically when models are deployed on low-resource devices, such as in the Internet of Things, or run as Web micro-services where resources are allocated on demand. Our packed serialized trees (PACSET) encode reference locality in the layout of a tree ensemble using principles from… 



Occupy the cloud: distributed computing for the 99%

It is suggested that stateless functions are a natural fit for data processing in future computing environments, based on recent trends in network bandwidth and the advent of disaggregated storage.

knor: A NUMA-Optimized In-Memory, Distributed and Semi-External-Memory k-means Library

k-means is one of the most influential and utilized machine learning algorithms. Its computation limits the performance and scalability of many statistical analysis and machine learning tasks. We

Treelite: Toolbox for Decision Tree Deployment

This paper introduces a brand new tree library treelite, a toolbox to facilitate easy deployment of models and accelerate prediction performance that allows for optimizations that improve prediction performance without changing any detail of the model.

RFAcc: a 3D ReRAM associative array based random forest accelerator

This paper proposes RFAcc, a ReRAM based accelerator, to speed up random forest training process, and proposes three optimizations, i.e., unary encoding, pipeline design, and parallel tree node training, to fully utilize the accelerator resources for maximized throughput improvement.

Clipper: A Low-Latency Online Prediction Serving System

Clipper is introduced, a general-purpose low-latency prediction serving system that introduces a modular architecture to simplify model deployment across frameworks and applications and improves prediction throughput, accuracy, and robustness without modifying the underlying machine learning frameworks.

Resource-efficient Machine Learning in 2 KB RAM for the Internet of Things

Bonsai can make predictions in milliseconds even on slow microcontrollers, can fit in KB of memory, has lower battery consumption than all other algorithms, and achieves prediction accuracies that can be as much as 30% higher than state-of-the-art methods for resource-efficient machine learning.

Lossless (and Lossy) Compression of Random Forests

This work introduces a novel method for lossless compression of tree-based ensemble methods, focusing on random forests, based on probabilistic modeling of the ensemble's trees, followed by model clustering via Bregman divergence.

Breadth-first, Depth-next Training of Random Forests

A novel, dynamic, hybrid BFS-DFS algorithm is designed and demonstrated that it performs better than both BFS and DFS, and is more robust in the presence of workloads with different characteristics.

Serverless Computing: One Step Forward, Two Steps Back

This paper addresses critical gaps in first-generation serverless computing, which place its autoscaling potential at odds with dominant trends in modern computing: notably data-centric and distributed computing, but also open source and custom hardware.

Boosted Race Trees for Low Energy Classification

It is demonstrated that race logic, in which temporally coded signals are getting processed in a dataflow fashion, provides interesting new capabilities for in-sensor processing applications, and that tree-based classifiers can be naturally encoded in this class of logic.