• Publications
  • Influence
Asynchrony begets momentum, with an application to deep learning
TLDR
Asynchronous methods are widely used in deep learning, but have limited theoretical justification when applied to non-convex problems. Expand
  • 96
  • 16
  • PDF
Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs
TLDR
We study the factors affecting training time in multi-device deep learning systems. Expand
  • 50
  • 10
  • PDF
Spatial: a language and compiler for application accelerators
TLDR
We describe Spatial's hardware-centric abstractions for both programmer productivity and design performance, and summarize the compiler passes required to support them. Expand
  • 53
  • 9
  • PDF
Plasticine: A reconfigurable architecture for parallel patterns
TLDR
Reconfigurable architectures have gained popularity in recent years as they allow the design of energy-efficient accelerators. Expand
  • 78
  • 7
  • PDF
Impact of FPGA architecture on resource sharing in high-level synthesis
TLDR
Resource sharing is a key area-reduction approach in high- level synthesis (HLS) in which a single hardware functional unit is used to implement multiple operations in the high-level circuit specification. Expand
  • 50
  • 5
  • PDF
Caffe con Troll: Shallow Ideas to Speed Up Deep Learning
TLDR
We present Caffe con Troll (CcT), a fully compatible end-to-end version of the popular framework Caffe with rebuilt internals. Expand
  • 54
  • 4
  • PDF
Automating the Design of Processor/Accelerator Embedded Systems with LegUp High-Level Synthesis
TLDR
In this paper, we overview the LegUp framework and describe several recent developments: 1) support for an embedded ARM processor, as is available on Altera's recently released SoC FPGA, 2) HLS support for software parallelization schemes -- pthreads and OpenMP, 3) enhancements to LegUp's core HLS algorithms that raise the quality of the auto-generated hardware, and, 4) a preliminary debugging and verification framework. Expand
  • 27
  • 1
  • PDF
TensorFlow to Cloud FPGAs: Tradeoffs for Accelerating Deep Neural Networks
TLDR
We present the first open-source TensorFlow to FPGA tool capable of running state-of-the-art DNNs. Expand
  • 3
  • 1
  • PDF
Profiling-driven multi-cycling in FPGA high-level synthesis
TLDR
Multi-cycling is a well-known strategy to improve performance in digital design, wherein the required time for selected combinational paths is lengthened to multiple clock cycles (rather than just one). Expand
  • 13
  • PDF
Coagulation of human prostate volumes with MRI-controlled transurethral ultrasound therapy: results in gel phantoms.
PURPOSE The feasibility and safety of magnetic resonance imaging (MRI)-controlled transurethral ultrasound therapy were demonstrated recently in a preliminary human study in which a small subvolumeExpand
  • 4