Mahdi Nazm Bojnordi

Learn More
This paper proposes reuse of on-chip networks for testing switches in network on chips (NoCs). The proposed algorithm broadcasts test vectors of switches through the on-chip networks and detects faults by comparing output responses of switches with each other. This algorithm alleviates the need for: (1) external comparison of the output response of the(More)
Increasing cache sizes in modern microprocessors require long wires to connect cache arrays to processor cores. As a result, the last-level cache (LLC) has become a major contributor to processor energy, necessitating techniques to increase the energy efficiency of data exchange over LLC interconnects. This paper presents an energy-efficient data exchange(More)
Modern memory controllers employ sophisticated address mapping, command scheduling, and power management optimizations to alleviate the adverse effects of DRAM timing and resource constraints on system performance. A promising way of improving the versatility and efficiency of these controllers is to make them programmable---a proven technique that has seen(More)
Many 2-D data processing applications can be simplified and represented by use of 1-D operations. Such tools, however, require applying both vertical and horizontal operations to the data blocks. The data transposing units is preferred to be used by the designers rather than applying individual operations for horizontal and vertical directions. Hence,(More)
The Boltzmann machine is a massively parallel computational model capable of solving a broad class of combinatorial optimization problems. In recent years, it has been successfully applied to training deep machine learning models on massive datasets. High performance implementations of the Boltzmann machine using GPUs, MPI-based HPC clusters, and FPGAs have(More)
H.264/AVC as the most recent video coding standard delivers significantly better performance compared to previous standards, supporting higher video quality over lower bit rate channels. The H.264 in-loop deblocking filter is one of the several complex techniques that have realized this superior coding quality. The deblocking filter is a computationally and(More)
One of the main reasons behind the superior efficiency of the H.264/AVC video coding standard is the use of an in-loop deblocking filter. Since the deblocking filter is computation and data intensive, it has a profound impact on the speed degradation of both encoding and decoding processes. In this paper, we propose an efficient deblocking filter(More)
This paper explores the use of MOS current-mode logic (MCML) as a fast and low noise alternative to static CMOS circuits in microprocessors, thereby improving the performance, energy efficiency, and signal integrity of future computer systems. The power and ground noise generated by an MCML circuit is typically 10–100× smaller than the noise generated by a(More)
Near-threshold computing (NTC) is an effective technique for improving the energy efficiency of a CMOS microprocessor, but suffers from a significant performance loss and an increased sensitivity to voltage noise. MOS current-mode logic (MCML), a differential logic family, maintains a low voltage swing and a constant current, making it inherently fast and(More)
Variable block size motion estimation (VBSME) is adopted in H.264/AVC to improve the coding efficiency. However, supporting various block sizes significantly increases the complexity of both video encoding and decoding. In this paper a multi-level parallel architecture for H.264/AVC motion estimation is proposed. A SIMD architecture is proposed for absolute(More)