Share This Author
Asynchrony begets momentum, with an application to deep learning
- Ioannis Mitliagkas, Ce Zhang, Stefan Hadjis, C. Ré
- Computer Science54th Annual Allerton Conference on Communication…
- 31 May 2016
It is shown that running stochastic gradient descent in an asynchronous manner can be viewed as adding a momentum-like term to the SGD iteration, and an important implication is that tuning the momentum parameter is important when considering different levels of asynchrony.
Spatial: a language and compiler for application accelerators
This work describes a new domain-specific language and compiler called Spatial for higher level descriptions of application accelerators, and summarizes the compiler passes required to support these abstractions, including pipeline scheduling, automatic memory banking, and automated design tuning driven by active machine learning.
Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs
The novel understanding of the interaction between system and optimization dynamics to provide an efficient hyperparameter optimizer is used, demonstrating that the most popular distributed deep learning systems fall within the tradeoff space, but do not optimize within the space.
Plasticine: A reconfigurable architecture for parallel patterns
- R. Prabhakar, Yaqi Zhang, K. Olukotun
- Computer ScienceACM/IEEE 44th Annual International Symposium on…
- 24 June 2017
This work designs Plasticine, a new spatially reconfigurable architecture designed to efficiently execute applications composed of parallel patterns that provide an improvement of up to 76.9× in performance-per-Watt over a conventional FPGA over a wide range of dense and sparse applications.
Impact of FPGA architecture on resource sharing in high-level synthesis
It is shown that certain multi-operator patterns occur multiple times in programs, creating additional opportunities for sharing larger composite functional units comprised of patterns of interconnected operators.
Caffe con Troll: Shallow Ideas to Speed Up Deep Learning
This work builds CcT, a fully compatible end-to-end version of the popular framework Caffe with rebuilt internals, and finds that, by employing standard batching optimizations for CPU training, it achieves a 6:3× throughput improvement over Caffe on popular networks like CaffeNet.
Automating the Design of Processor/Accelerator Embedded Systems with LegUp High-Level Synthesis
- B. Fort, Andrew Canis, J. Anderson
- Computer Science12th IEEE International Conference on Embedded…
- 26 August 2014
The LegUp framework is overviewed and support for an embedded ARM processor, as is available on Altera's recently released SoC FPGA, HLS support for software parallelization schemes -- pthreads and OpenMP, and a preliminary debugging and verification framework providing C source-level debugging of HLS hardware are described.
TensorFlow to Cloud FPGAs: Tradeoffs for Accelerating Deep Neural Networks
This work presents the first open-source TensorFlow to FPGA tool capable of running state-of-the-art DNNs, providing competitive performance and higher accuracy compared to a proprietary tool, thus providing a public framework for research exploration in the DNN inference space.
Profiling-driven multi-cycling in FPGA high-level synthesis
- Stefan Hadjis, Andrew Canis, Ryoya Sobue, Yuko Hara-Azumi, H. Tomiyama, J. Anderson
- Computer ScienceDesign, Automation & Test in Europe Conference…
- 9 March 2015
This paper considers multi-cycling in the high-level synthesis context (HLS) and uses software profiling to guide multi- cycling optimizations, and shows that profiling-driven multi-Cycling leads to an average speedup of over 10% across 13 benchmark circuits, with some circuit speedups in excess of 30%.
Importance Sampling over Sets: A New Probabilistic Inference Scheme
This work proposes a generalized importance sampling scheme based on randomly selecting (exponentially large) subsets of states rather than individual ones, and incorporates this idea into a novel maximum likelihood learning algorithm based on cutting planes.