CODE: Compiler-based Neuron-aware Ensemble training
@inproceedings{Trainiti2021CODECN, title={CODE: Compiler-based Neuron-aware Ensemble training}, author={Ettore M. G. Trainiti and Thanapon Noraset and David Demeter and Doug Downey and Simone Campanoni}, booktitle={Conference on Machine Learning and Systems}, year={2021} }
Deep Neural Networks (DNNs) are redefining the state-of-the-art performance in a variety of tasks like speech recognition and image classification. These impressive results are often enabled by ensembling many DNNs together. Surprisingly, ensembling is often done by training several DNN instances from scratch and combining them. This paper shows that there is significant redundancy in today’s way of ensembling. The novelty we propose is CODE, a compiler approach designed to automatically…
3 Citations
ALT: Boosting Deep Learning Performance by Breaking the Wall between Graph and Operator Level Optimizations
- 2022
Computer Science
ArXiv
ALT provides a generic transformation module to manipulate layouts and loops with easy-to-use primitive functions and integrates an auto-tuning module that jointly optimizes graph-level data layouts and operator-level loops while guaranteeing efficiency.
ALT: Breaking the Wall between Data Layout and Loop Optimizations for Deep Learning Compilation
- 2023
Computer Science
EuroSys
ALT is proposed, a deep compiler that performs joint graph-level layout optimization and operator-level loop optimization and provides a generic transformation module to manipulate layouts and loops with easy-to-use primitive functions.
ALT: Breaking the Wall between Graph and Operator Level Optimizations for Deep Learning Compilation
- 2022
Computer Science
ALT is proposed, a compiler that performs joint graph-and operator-level optimizations for deep models and provides a generic transformation module to manipulate layouts and loops with easy-to-use primitive functions.
63 References
Why M Heads are Better than One: Training a Diverse Ensemble of Deep Networks
- 2015
Computer Science
ArXiv
It is demonstrated that TreeNets can improve ensemble performance and that diverse ensembles can be trained end-to-end under a unified loss, achieving significantly higher "oracle" accuracies than classical ensembled.
Wootz: a compiler-based framework for fast CNN pruning via composability
- 2019
Computer Science
PLDI
A compiler-based framework named Wootz is developed, which, for an arbitrary CNN, automatically generates code that builds a Teacher-Student scheme to materialize composability-based pruning, and a compression-based algorithm is designed to efficiently identify the set of CNN layers to pre-train for maximizing their reuse benefits in CNN pruning.
Latte: a language, compiler, and runtime for elegant and efficient deep neural networks
- 2016
Computer Science
PLDI
Latte is presented, a domain-specific language for DNNs that provides a natural abstraction for specifying new layers without sacrificing performance, and 3-6x speedup over Caffe (C++/MKL) on the three state-of-the-art ImageNet models executing on an Intel Xeon E5-2699 v3 x86 CPU.
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
- 2017
Computer Science
ICLR
This work introduces a Sparsely-Gated Mixture-of-Experts layer (MoE), consisting of up to thousands of feed-forward sub-networks, and applies the MoE to the tasks of language modeling and machine translation, where model capacity is critical for absorbing the vast quantities of knowledge available in the training corpora.
MotherNets: Rapid Deep Ensemble Learning
- 2020
Computer Science
MLSys
Compared to state-of-the-art approaches such as Snapshot Ensembles, Knowledge Distillation, and TreeNets, MotherNets provide a new Pareto frontier for the accuracy-training cost tradeoff.
Project Adam: Building an Efficient and Scalable Deep Learning Training System
- 2014
Computer Science
OSDI
The design and implementation of a distributed system called Adam comprised of commodity server machines to train large deep neural network models that exhibits world-class performance, scaling and task accuracy on visual recognition tasks and shows that task accuracy improves with larger models.
Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators
- 2016
Computer Science
2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)
The continued success of Deep Neural Networks (DNNs) in classification tasks has sparked a trend of accelerating their execution with specialized hardware. While published designs easily give an…
Snapshot Ensembles: Train 1, get M for free
- 2017
Computer Science
ICLR
This paper proposes a method to obtain the seemingly contradictory goal of ensembling multiple neural networks at no additional training cost by training a single neural network, converging to several local minima along its optimization path and saving the model parameters.
DaDianNao: A Machine-Learning Supercomputer
- 2014
Computer Science
2014 47th Annual IEEE/ACM International Symposium on Microarchitecture
This article introduces a custom multi-chip machine-learning architecture, showing that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip system.
Going deeper with convolutions
- 2015
Computer Science
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition…