As transistors keep shrinking and on-chip data caches keep growing, static power dissipation due to leakage of caches takes an increasing fraction of total power in processors. Several techniques have already been proposed to reduce leakage power by turning off unused cache lines. However, they all have to pay the price of performance degradation. This(More)
OpenCL is an industry's attempt to unify heterogeneous multicore programming. With its programming model defining SPMD kernels, vector types, and address space qualifiers, OpenCL allows programmers to exploit data parallelism with multicore processors and SIMD instructions as well as data locality with memory hierarchy. Recently, OpenCL has gained success(More)
A new framework for the Recognition, Mining and Synthesis (RMS)system, has been proposed to make meaningful use of the enormous amount of information. Based on the same concept, we propose a face RMS system, which consists of face detection, facial expression recognition, and facial expression exaggeration components, for generating exaggerated views of(More)
Effective address calculation for load and store instructions needs to compete for ALU with other instructions and hence extra latencies might be incurred to data cache accesses. Fast address generation is an approach proposed to reduce cache access latencies. This paper presents a fast address generator that can eliminate most of the effective address(More)
Markov random field models provide a robust formulation of low-level vision problems. Among all these problems, stereo vision remains the most investigated field. The belief propagation (BP) method provides accurate result in stereo vision problems. However, the algorithm remains slow for practical use. This paper describes a case study on the(More)
The increasing number of complex jobs scheduled to execute on embedded systems has increased the importance of fast response times in job scheduling and task switching onembedded processors. This paper addresses the issue of reducing context-switching overhead. We present a novel register file architecture, the paged register file (pRF), that comprises two(More)
The development of embedded systems has moved toward multicore in recent years. As processor numbers continue growing in embedded multicore systems, how to provide efficient programming models and tailored compiler supports becomes a critical issue in developing embedded multicore applications. Though C still dominates embedded computing so far, C++ is(More)