Sri Hari Krishna Narayanan

Learn More
With the increasing scaling of manufacturing technology, process variation is a phenomenon that has become more prevalent. As a result, in the context of Chip Multiprocessors (CMPs) for example, it is possible that identically-designed processor cores on the chip have non-identical peak frequencies and power consumptions. To cope with such a design, each(More)
Banking has been identified as one of the effective methods using which memory energy can be reduced. We propose a novel approach that improves the energy effectiveness of a banked memory architecture by performing extra computations if doing so makes it unnecessary to reactivate a bank which is in the low-power operating mode. More specifically, when an(More)
In this paper, we present and evaluate three temperature-sensitive loop parallelization strategies for array-intensive applications executed on chip multiprocessors in order to reduce the peak temperature. Our experimental results show that the peak (average) temperature can be reduced by 20.9/spl deg/C (4.3/spl deg/C) when averaged over all the(More)
Emerging exascale architectures bring forth new challenges related to heterogeneous systems power, energy, cost, and resilience. These new challenges require a shift from conventional paradigms in understanding how to best exploit and optimize these features and limitations. Our objective is to identify the top few dominant characteristics in a set of(More)
One of the critical problems associated with emerging chip multiprocessors (CMPs) is the management of on-chip shared cache space. Unfortunately, single processor centric data locality optimization schemes may not work well in the CMP case as data accesses from multiple cores can create conflicts in the shared cache space. The main contribution of this(More)
Understanding and tuning the performance of complex applications on modern hardware are challenging tasks, requiring understanding of the algorithms, implementation, compiler optimizations, and underlying architecture. Many tools exist for measuring and analyzing the runtime performance of applications. Obtaining sufficiently detailed performance data and(More)
We present a new tool, ADIC2, for automatic differentiation (AD) of C and C++ code through source-to-source transformation. ADIC2 is the successor of the ADIC differentiation tool, which supports forward mode AD of C and a small subset of C++. ADIC2 was completely redesigned and reimplemented as part of the OpenAD software framework, resulting in a robust,(More)
Many scientific applications benefit from the accurate and efficient computation of derivatives. Automatically generating these derivative computations from an applications source code offers a competitive alternative to other approaches, such as less accurate numerical approximations or labor-intensive analytical implementations. ADIC2 is a source(More)
As transistor counts keep increasing and clock frequencies rise, high power consumption is becoming one of the most important obstacles, preventing further scaling and performance improvements. While high power consumption brings many problems with it, high power density and thermal hotspots are maybe two of the most important ones. Current architectures(More)