Learn More
—This work explores the data reuse properties of full-search block-matching (FSBM) for motion estimation (ME) and associated architecture designs, as well as memory bandwidth requirements. Memory bandwidth in high-quality video is a major bottleneck to designing an implementable architecture because of large frame size and search range. First, memory(More)
ÐThis paper presents a design methodology for high-speed Booth encoded parallel multiplier. For partial product generation, we propose a new modified Booth encoding (MBE) scheme to improve the performance of traditional MBE schemes. For final addition, a new algorithm is developed to construct multiple-level conditional-sum adder (MLCSMA). The proposed(More)
In this paper, the efficient memory-based VLSI arrays and the accompanied new design approach for the discrete Fourier transform (DFT) and discrete cosine transform (DCT) are presented. The DFT and DCT are formulated as cyclic convolution forms and mapped into linear arrays which characterize small numbers of 1 / 0 channels and low 1 / 0 bandwidth. Since(More)
—This paper presents a memory-efficient approach to realize the cyclic convolution and its application to the discrete cosine transform (DCT). We adopt the way of distributed arithmetic (DA) computation, exploit the symmetry property of DCT coefficients to merge the elements in the matrix of DCT kernel, separate the kernel to be two perfect cyclic forms,(More)
I, 2) are unbiased and statistically independent. The frequency res-Abstract-A new approach to derive a systolic algorithm for prime-length discrete cosine transform (DCT) is proposed. It makes use of the input/output (UO) data permutations and the symmetry property of cosine kernels such that the proposed array possesses outstanding E (i-A)* = B,?. (4.2)(More)
—This paper presents a novel split-radix fast Fourier transform (SRFFT) pipeline architecture design. A mapping methodology has been developed to obtain regular and modular pipeline for split-radix algorithm. The pipeline is repartitioned to balance the latency between complex multiplication and butterfly operation by using carry-save addition. The number(More)
—The ongoing advancements in VLSI technology allow system-on-a-chip (SoC) design to integrate heterogeneous control and computing functions into a single chip. On the other hand, the pressures of area and cost lead to the requirement for a single, shared off-chip DRAM memory subsystem. To satisfy different memory access requirements for latency and(More)