Author pages are created from data sourced from our academic publisher partnerships and public sources.
Share This Author
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
TVM is a compiler that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends and automates optimization of low-level programs to hardware characteristics by employing a novel, learning-based cost modeling method for rapid exploration of code optimizations.
EnerJ: approximate data types for safe and general low-power computation
- Adrian Sampson, Werner Dietl, Emily Fortuna, Dan Gnanapragasam, L. Ceze, D. Grossman
- Computer SciencePLDI '11
- 4 June 2011
EnerJ is developed, an extension to Java that adds approximate data types and a hardware architecture that offers explicit approximate storage and computation and allows a programmer to control explicitly how information flows from approximate data to precise data.
Architecture support for disciplined approximate programming
An ISA extension that provides approximate operations and storage is described that gives the hardware freedom to save energy at the cost of accuracy and Truffle, a microarchitecture design that efficiently supports the ISA extensions is proposed.
Bulk Disambiguation of Speculative Threads in Multiprocessors
- L. Ceze, James Tuck, J. Torrellas, C. Cascaval
- Computer Science33rd International Symposium on Computer…
- 1 May 2006
Bulk is presented, a novel approach to simplify these mechanisms to hash-encode a thread's access information in a concise signature, and then support in hardware signature operations that efficiently process sets of addresses that implement the mechanisms described.
Learning to Optimize Tensor Programs
A learning-based framework to optimize tensor programs for deep learning workloads that learns domain-specific statistical cost models to guide the search of tensor operator implementations over billions of possible program variants and accelerates the search by effective model transfer across workloads.
CoreDet: a compiler and runtime system for deterministic multithreaded execution
This work develops a compiler and runtime system that runs arbitrary multithreaded C/C++ POSIX Threads programs deterministically but resorts to serialization rarely, for handling interthread communication and synchronization.
TVM: End-to-End Optimization Stack for Deep Learning
TVM is proposed, an end-to-end optimization stack that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends and discusses the optimization challenges specific toDeep learning that TVM solves.
POSH: a TLS compiler that exploits program structure
POSH is a new, fully automated TLS compiler built on top of gcc that leverages the code structures created by the programmer, namely subroutines and loops to generate speculative tasks that are crucial to overall TLS performance.
Random access in large-scale DNA data storage
A large library of primers are designed and validated that enable individual recovery of all files stored within the DNA, and an algorithm is developed that greatly reduces the sequencing read coverage required for error-free decoding by maximizing information from all sequence reads.
Neural Acceleration for General-Purpose Approximate Programs
NPUs leverage the approximate algorithmic transformation that converts regions of code from a Von Neumann model to a neural model and shows that significant performance and efficiency gains are possible when the abstraction of full accuracy is relaxed in general-purpose computing.