• Publications
  • Influence
Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds
TLDR
We introduce a new, general geo-distributed ML system, Gaia, that decouples the communication within a data center from the communication between data centers, enabling different communication and consistency models for each. Expand
  • 186
  • 30
  • PDF
Focus: Querying Large Video Datasets with Low Latency and Low Cost
TLDR
We build Focus, a system for low-latency and low-cost querying on large video datasets. Expand
  • 101
  • 18
  • PDF
Fast Bulk Bitwise AND and OR in DRAM
TLDR
In this work, we propose a new and simple mechanism to implement bulk bitwise AND and OR operations in DRAM, which is faster and more efficient than existing mechanisms. Expand
  • 131
  • 16
  • PDF
Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems
TLDR
We develop two new mechanisms to address this key challenge. Expand
  • 103
  • 12
  • PDF
Accelerating pointer chasing in 3D-stacked memory: Challenges, mechanisms, evaluation
TLDR
This paper identifies the key design challenges of designing a pointer chasing accelerator in memory, describes new mechanisms employed within IMPICA to solve these challenges, and evaluates the performance and energy benefits of our accelerator. Expand
  • 116
  • 9
  • PDF
Understanding Latency Variation in Modern DRAM Chips: Experimental Characterization, Analysis, and Optimization
TLDR
We propose Flexible-LatencY DRAM (FLY-DRAM), a mechanism that exploits latency variation across DRAM cells within a DRAM chip to improve system performance. Expand
  • 133
  • 7
  • PDF
LazyPIM: An Efficient Cache Coherence Mechanism for Processing-in-Memory
TLDR
We propose LazyPIM, a new hardware cache coherence mechanism that efficiently batches coherence messages sent by the PIM cores. Expand
  • 103
  • 7
  • PDF
Zorua: A holistic approach to resource virtualization in GPUs
TLDR
This paper introduces a new resource virtualization framework, Zorua, that decouples the programmer-specified resource usage of a GPU application from the actual allocation in the on-chip hardware resources. Expand
  • 47
  • 5
  • PDF
The Non-IID Data Quagmire of Decentralized Machine Learning
TLDR
We present a detailed experimental study of decentralized DNN training on a common type of data skew: skewed distribution of data labels across devices/locations. Expand
  • 51
  • 4
  • PDF
...
1
2
3
...