• Corpus ID: 233218121

CODE: Compiler-based Neuron-aware Ensemble training

@inproceedings{Trainiti2021CODECN,
  title={CODE: Compiler-based Neuron-aware Ensemble training},
  author={Ettore M. G. Trainiti and Thanapon Noraset and David Demeter and Doug Downey and Simone Campanoni},
  booktitle={Conference on Machine Learning and Systems},
  year={2021}
}
Deep Neural Networks (DNNs) are redefining the state-of-the-art performance in a variety of tasks like speech recognition and image classification. These impressive results are often enabled by ensembling many DNNs together. Surprisingly, ensembling is often done by training several DNN instances from scratch and combining them. This paper shows that there is significant redundancy in today’s way of ensembling. The novelty we propose is CODE, a compiler approach designed to automatically… 

Figures and Tables from this paper

ALT: Boosting Deep Learning Performance by Breaking the Wall between Graph and Operator Level Optimizations

ALT provides a generic transformation module to manipulate layouts and loops with easy-to-use primitive functions and integrates an auto-tuning module that jointly optimizes graph-level data layouts and operator-level loops while guaranteeing efficiency.

ALT: Breaking the Wall between Data Layout and Loop Optimizations for Deep Learning Compilation

ALT is proposed, a deep compiler that performs joint graph-level layout optimization and operator-level loop optimization and provides a generic transformation module to manipulate layouts and loops with easy-to-use primitive functions.

ALT: Breaking the Wall between Graph and Operator Level Optimizations for Deep Learning Compilation

ALT is proposed, a compiler that performs joint graph-and operator-level optimizations for deep models and provides a generic transformation module to manipulate layouts and loops with easy-to-use primitive functions.

Why M Heads are Better than One: Training a Diverse Ensemble of Deep Networks

It is demonstrated that TreeNets can improve ensemble performance and that diverse ensembles can be trained end-to-end under a unified loss, achieving significantly higher "oracle" accuracies than classical ensembled.

Wootz: a compiler-based framework for fast CNN pruning via composability

A compiler-based framework named Wootz is developed, which, for an arbitrary CNN, automatically generates code that builds a Teacher-Student scheme to materialize composability-based pruning, and a compression-based algorithm is designed to efficiently identify the set of CNN layers to pre-train for maximizing their reuse benefits in CNN pruning.

Latte: a language, compiler, and runtime for elegant and efficient deep neural networks

Latte is presented, a domain-specific language for DNNs that provides a natural abstraction for specifying new layers without sacrificing performance, and 3-6x speedup over Caffe (C++/MKL) on the three state-of-the-art ImageNet models executing on an Intel Xeon E5-2699 v3 x86 CPU.

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

This work introduces a Sparsely-Gated Mixture-of-Experts layer (MoE), consisting of up to thousands of feed-forward sub-networks, and applies the MoE to the tasks of language modeling and machine translation, where model capacity is critical for absorbing the vast quantities of knowledge available in the training corpora.

MotherNets: Rapid Deep Ensemble Learning

Compared to state-of-the-art approaches such as Snapshot Ensembles, Knowledge Distillation, and TreeNets, MotherNets provide a new Pareto frontier for the accuracy-training cost tradeoff.

Project Adam: Building an Efficient and Scalable Deep Learning Training System

The design and implementation of a distributed system called Adam comprised of commodity server machines to train large deep neural network models that exhibits world-class performance, scaling and task accuracy on visual recognition tasks and shows that task accuracy improves with larger models.

Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators

The continued success of Deep Neural Networks (DNNs) in classification tasks has sparked a trend of accelerating their execution with specialized hardware. While published designs easily give an

Snapshot Ensembles: Train 1, get M for free

This paper proposes a method to obtain the seemingly contradictory goal of ensembling multiple neural networks at no additional training cost by training a single neural network, converging to several local minima along its optimization path and saving the model parameters.

DaDianNao: A Machine-Learning Supercomputer

    Yunji ChenTao Luo O. Temam
    Computer Science
    2014 47th Annual IEEE/ACM International Symposium on Microarchitecture
  • 2014
This article introduces a custom multi-chip machine-learning architecture, showing that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip system.

Going deeper with convolutions

We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition
...