Learn More
— Many 2-D data processing applications can be simplified and represented by use of 1-D operations. Such tools, however, require applying both vertical and horizontal operations to the data blocks. The data transposing units is preferred to be used by the designers rather than applying individual operations for horizontal and vertical directions. Hence,(More)
This paper proposes reuse of on-chip networks for testing switches in network on chips (NoCs). The proposed algorithm broadcasts test vectors of switches through the on-chip networks and detects faults by comparing output responses of switches with each other. This algorithm alleviates the need for: (1) external comparison of the output response of the(More)
Modern memory controllers employ sophisticated address mapping, command scheduling, and power management optimizations to alleviate the adverse effects of DRAM timing and resource constraints on system performance. A promising way of improving the versatility and efficiency of these controllers is to make them programmable---a proven technique that has seen(More)
H.264/AVC as the most recent video coding standard delivers significantly better performance compared to previous standards, supporting higher video quality over lower bit rate channels. The H.264 in-loop deblocking filter is one of the several complex techniques that have realized this superior coding quality. The deblocking filter is a computationally and(More)
The Boltzmann machine is a massively parallel computational model capable of solving a broad class of combinatorial optimization problems. In recent years, it has been successfully applied to training deep machine learning models on massive datasets. High performance implementations of the Boltzmann machine using GPUs, MPI-based HPC clusters, and FPGAs have(More)
Increasing cache sizes in modern microprocessors require long wires to connect cache arrays to processor cores. As a result, the last-level cache (LLC) has become a major contributor to processor energy, necessitating techniques to increase the energy efficiency of data exchange over LLC interconnects. This paper presents an energy-efficient data exchange(More)
One of the main reasons behind the superior efficiency of the H.264/AVC video coding standard is the use of an in-loop deblocking filter. Since the deblocking filter is computation and data intensive, it has a profound impact on the speed degradation of both encoding and decoding processes. In this paper, we propose an efficient deblocking filter(More)
Near-threshold computing (NTC) is an effective technique for improving the energy efficiency of a CMOS microprocessor, but suffers from a significant performance loss and an increased sensitivity to voltage noise. MOS current-mode logic (MCML), a differential logic family, maintains a low voltage swing and a constant current, making it inherently fast and(More)
— Variable block size motion estimation (VBSME) is adopted in H.264/AVC to improve the coding efficiency. However, supporting various block sizes significantly increases the complexity of both video encoding and decoding. In this paper a multi-level parallel architecture for H.264/AVC motion estimation is proposed. A SIMD architecture is proposed for(More)