Learn More
The explosion of digital data and the ever-growing need for fast data analysis have made in-memory big-data processing in computer systems increasingly important. In particular, large-scale graph processing is gaining attention due to its broad applicability from social science to machine learning. However, scalable hardware design that can efficiently(More)
Although the latest high-end smartphone has powerful CPU and GPU, running deeper convolutional neural networks (CNNs) for complex tasks such as ImageNet classification on mobile devices is challenging. To deploy deep CNNs on mobile devices, we present a simple and effective scheme to compress the entire CNN, which we call one-shot whole network compression.(More)
Processing-in-memory (PIM) is rapidly rising as a viable solution for the memory wall crisis, rebounding from its unsuccessful attempts in 1990s due to practicality concerns, which are alleviated with recent advances in 3D stacking technologies. However, it is still challenging to integrate the PIM architectures with existing systems in a seamless manner(More)
This paper presents a method for hardware-software cosynthesis with run-time incrementally reconfigurable FPGAs. To reduce the run-time overhead of reconfiguring FPGAs, we present a concept called early partial reconfiguration (EPR) which minimizes the overhead by performing reconfiguration for an operation (or a task in our terms) mapped to an FPGA as(More)
The increasing number of integrated components on a single chip has increased the importance of on-chip networks. A significant part of on-chip network routers is the buffer, as it occupies a large area and consumes a significant amount of power. In this work, we propose FlexiBuffer, a microarchitecture in which we minimize buffer leakage power by using(More)
This paper presents a high-level component-based methodology and design environment for application-specific multicore SoC architectures. Component-based design provides primitives to build complex architectures from basic components. This bottom-up approach allows design-architects to explore efficient custom solutions with best performances. This paper(More)
To enable fast and accurate evaluation of HW/SW implementationchoices of on-chip communication, we presenta method to automatically generate timed OS simulationmodels. The method generates the OS simulation modelswith the simulation environment as a virtual processor.Since the generated OS simulation models use finalOS code, the presented method can(More)
Hybrid main memory consisting of DRAM and non-volatile memory is attractive since the non-volatile memory can give the advantage of low standby power while DRAM provides high performance and better active power. In this work, we address the power management of such a hybrid main memory consisting of DRAM and phase-change RAM (PRAM). In order to reduce DRAM(More)