Learn More
Banking has been identified as one of the effective methods using which memory energy can be reduced. We propose a novel approach that improves the energy effectiveness of a banked memory architecture by performing extra computations if doing so makes it unnecessary to reactivate a bank which is in the low-power operating mode. More specifically, when an(More)
One of the critical problems associated with emerging chip multiprocessors (CMPs) is the management of on-chip shared cache space. Unfortunately, single processor centric data locality optimization schemes may not work well in the CMP case as data accesses from multiple cores can create conflicts in the shared cache space. The main contribution of this(More)
With the increasing scaling of manufacturing technology, process variation is a phenomenon that has become more prevalent. As a result, in the context of Chip Multiprocessors (CMPs) for example, it is possible that identically-designed processor cores on the chip have non-identical peak frequencies and power consumptions. To cope with such a design, each(More)
In this paper, we present and evaluate three temperature-sensitive loop parallelization strategies for array-intensive applications executed on chip multiprocessors in order to reduce the peak temperature. Our experimental results show that the peak (average) temperature can be reduced by 20.9/spl deg/C (4.3/spl deg/C) when averaged over all the(More)
We present a new tool, ADIC2, for automatic differentiation (AD) of C and C++ code through source-to-source transformation. ADIC2 is the successor of the ADIC differentiation tool, which supports forward mode AD of C and a small subset of C++. ADIC2 was completely redesigned and reimplemented as part of the OpenAD software framework, resulting in a robust ,(More)
Many scientific applications benefit from the accurate and efficient computation of derivatives. Automatically generating these derivative computations from an applications source code offers a competitive alternative to other approaches, such as less accurate numerical approximations or labor-intensive analytical implementations. ADIC2 is a source(More)
Understanding and tuning the performance of complex applications on modern hardware are challenging tasks, requiring understanding of the algorithms, implementation, compiler optimizations, and underlying architecture. Many tools exist for measuring and analyzing the runtime performance of applications. Obtaining sufficiently detailed performance data and(More)
Within the multibody systems literature, few attempts have been made to use automatic differentiation for solving forward multibody dynamics and evaluating its computational efficiency. The most relevant implementations are found in the sensitivity analysis field, but they rarely address automatic differentiation issues in depth. This paper presents a(More)
Ever scaling process technology makes embedded systems more vulnerable to soft errors than in the past. One of the generic methods used to fight soft errors is based on duplicating instructions either in the spatial or temporal domain and then comparing the results to see whether they are different. This full duplication based scheme, though effective, is(More)
Emerging exascale architectures bring forth new challenges related to heterogeneous systems power, energy, cost, and resilience. These new challenges require a shift from conventional paradigms in understanding how to best exploit and optimize these features and limitations. Our objective is to identify the top few dominant characteristics in a set of(More)