• Publications
  • Influence
GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks
TLDR
GraphPIM is presented, a full-stack solution for graph computing that achieves higher performance using PIM functionality and an extension to PIM operations that can further bring performance benefits for more graph applications.
Characterizing the Deployment of Deep Neural Networks on Commercial Edge Devices
TLDR
This paper characterizes several commercial edge devices on popular frameworks using well-known convolution neural networks (CNNs), a type of DNN, and analyzes the impact of frameworks, their software stack, and their implemented optimizations on the final performance.
Musical Chair: Efficient Real-Time Recognition Using Collaborative IoT Devices
TLDR
This work proposes Musical Chair to enable efficient, localized, and dynamic real-time recognition by harvesting the aggregated computational power from the resource-constrained devices in the same IoT network as input sensors.
Demystifying the characteristics of 3D-stacked memories: A case study for Hybrid Memory Cube
TLDR
The thermal behavior of HMC is characterized in a real environment using the AC-510 accelerator and temperature is identified as a new limitation for this state-of-the-art design space, as well as deconstruct factors that contribute to latency and reveal their sources for high- and low-load accesses.
Batch-Aware Unified Memory Management in GPUs for Irregular Workloads
TLDR
This work provides the first comprehensive analysis of major inefficiencies that arise in page fault handling mechanisms employed in modern GPUs and proposes a GPU runtime software and hardware solution that increases the batch size and reduces the number of batches, thereby amortizing the øverheadName time.
CAIRO
TLDR
This article analyzes the advantages of instruction-level PIM offloading in the context of HMC-atomic instructions for graph-computing applications and proposes CAIRO, a compiler-assisted technique and decision model for enabling instruction- level offloading of PIM without any burden on programmers.
ALRESCHA: A Lightweight Reconfigurable Sparse-Computation Accelerator
TLDR
This work proposes a lightweight reconfigurable sparse-computation accelerator (Alrescha), which achieves an average speedup of 15.6x for scientific sparse problems, and 8x for graph algorithms, compared to GPU and consumes 14x less energy.
FAFNIR: Accelerating Sparse Gathering by Using Efficient Near-Memory Intelligent Reduction
TLDR
An effective solution for sparse gathering, an efficient near-memory intelligent reduction (Fafnir) tree, the leaves of which are all the ranks in a memory system, and the nodes gradually apply reduction operations while data is gathered from any rank, which minimizes data movement by performing entire operations at NDP and fully benefits from parallel memory accesses in parallel processing at NDP.
Collaborative Execution of Deep Neural Networks on Internet of Things Devices
TLDR
This paper proposes an approach that utilizes aggregated existing computing power of Internet of Things (IoT) devices surrounding an environment by creating a collaborative network that enhances the collaborative network byCreating a balanced and distributed processing pipeline.
Distributed Perception by Collaborative Robots
TLDR
This work proposes a framework to harvest the aggregated computational power of several low-power robots for enabling efficient, dynamic, and real-time recognition, which allows a group of multiple low- power robots to obtain a similar performance compared to a high-end embedded platform, Nvidia Tegra TX2.
...
1
2
3
4
5
...