Mahdi Nazm Bojnordi

Learn More
This paper proposes reuse of on-chip networks for testing switches in Network on Chips (NoCs). The proposed algorithm broadcasts test vectors of switches through the on-chip networks and detects faults by comparing output responses of switches with each other. This algorithm alleviates the need for: (1) external comparison of the output response of the(More)
Modern memory controllers employ sophisticated address mapping, command scheduling, and power management optimizations to alleviate the adverse effects of DRAM timing and resource constraints on system performance. A promising way of improving the versatility and efficiency of these controllers is to make them programmable---a proven technique that has seen(More)
— Many 2-D data processing applications can be simplified and represented by use of 1-D operations. Such tools, however, require applying both vertical and horizontal operations to the data blocks. The data transposing units is preferred to be used by the designers rather than applying individual operations for horizontal and vertical directions. Hence,(More)
H.264/AVC as the most recent video coding standard delivers significantly better performance compared to previous standards, supporting higher video quality over lower bit rate channels. The H.264 in-loop deblocking filter is one of the several complex techniques that have realized this superior coding quality. The deblocking filter is a computationally and(More)
Increasing cache sizes in modern microprocessors require long wires to connect cache arrays to processor cores. As a result, the last-level cache (LLC) has become a major contributor to processor energy, necessitating techniques to increase the energy efficiency of data exchange over LLC interconnects. This paper presents an energy-efficient data exchange(More)
One of the main reasons behind the superior efficiency of the H.264/AVC video coding standard is the use of an in-loop deblocking filter. Since the deblocking filter is computation and data intensive, it has a profound impact on the speed degradation of both encoding and decoding processes. In this paper, we propose an efficient deblocking filter(More)
The Boltzmann machine is a massively parallel computational model capable of solving a broad class of combinato-rial optimization problems. In recent years, it has been successfully applied to training deep machine learning models on massive datasets. High performance implementations of the Boltzmann machine using GPUs, MPI-based HPC clusters , and FPGAs(More)
— Variable block size motion estimation (VBSME) is adopted in H.264/AVC to improve the coding efficiency. However, supporting various block sizes significantly increases the complexity of both video encoding and decoding. In this paper a multi-level parallel architecture for H.264/AVC motion estimation is proposed. A SIMD architecture is proposed for(More)
Modern memory controllers employ sophisticated address mapping, command scheduling, and power management optimizations to alleviate the adverse effects of DRAM timing and resource constraints on system performance. A promising way of improving the versatility and efficiency of these controllers is to make them programmable—a proven technique that has(More)