• Publications
  • Influence
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
TLDR
TVM is a compiler that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends and automates optimization of low-level programs to hardware characteristics by employing a novel, learning-based cost modeling method for rapid exploration of code optimizations.
EnerJ: approximate data types for safe and general low-power computation
TLDR
EnerJ is developed, an extension to Java that adds approximate data types and a hardware architecture that offers explicit approximate storage and computation and allows a programmer to control explicitly how information flows from approximate data to precise data.
Architecture support for disciplined approximate programming
TLDR
An ISA extension that provides approximate operations and storage is described that gives the hardware freedom to save energy at the cost of accuracy and Truffle, a microarchitecture design that efficiently supports the ISA extensions is proposed.
Bulk Disambiguation of Speculative Threads in Multiprocessors
TLDR
Bulk is presented, a novel approach to simplify these mechanisms to hash-encode a thread's access information in a concise signature, and then support in hardware signature operations that efficiently process sets of addresses that implement the mechanisms described.
Learning to Optimize Tensor Programs
TLDR
A learning-based framework to optimize tensor programs for deep learning workloads that learns domain-specific statistical cost models to guide the search of tensor operator implementations over billions of possible program variants and accelerates the search by effective model transfer across workloads.
CoreDet: a compiler and runtime system for deterministic multithreaded execution
TLDR
This work develops a compiler and runtime system that runs arbitrary multithreaded C/C++ POSIX Threads programs deterministically but resorts to serialization rarely, for handling interthread communication and synchronization.
TVM: End-to-End Optimization Stack for Deep Learning
TLDR
TVM is proposed, an end-to-end optimization stack that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends and discusses the optimization challenges specific toDeep learning that TVM solves.
POSH: a TLS compiler that exploits program structure
TLDR
POSH is a new, fully automated TLS compiler built on top of gcc that leverages the code structures created by the programmer, namely subroutines and loops to generate speculative tasks that are crucial to overall TLS performance.
Random access in large-scale DNA data storage
TLDR
A large library of primers are designed and validated that enable individual recovery of all files stored within the DNA, and an algorithm is developed that greatly reduces the sequencing read coverage required for error-free decoding by maximizing information from all sequence reads.
Neural Acceleration for General-Purpose Approximate Programs
TLDR
NPUs leverage the approximate algorithmic transformation that converts regions of code from a Von Neumann model to a neural model and shows that significant performance and efficiency gains are possible when the abstraction of full accuracy is relaxed in general-purpose computing.
...
...