Learn More
—In this paper, we present area-and power-efficient architectures for the implementation of integer discrete cosine transform (DCT) of different lengths to be used in High Efficiency Video Coding (HEVC). We show that an efficient constant matrix-multiplication scheme can be used to derive parallel architectures for 1-D integer DCT of different lengths. We(More)
In this paper, we present a modular and pipeline architecture for lifting-based multilevel 2-D DWT, without using line-buffer and frame-buffer. Overall area-delay product is reduced in the proposed design by appropriate partitioning and scheduling of the computation of individual decomposition-levels. The processing for different levels is performed by a(More)
— In this paper, we have proposed a design strategy for the derivation of memory-efficient architecture for multi-level 2-D DWT. Using the proposed design scheme, we have derived a convolution-based generic architecture for the computation of 3-level 2-D DWT based on Daubechies as well as bi-orthogonal filters. The proposed structure does not involve(More)
— This paper presents a hardware-efficient systolic-like modular architecture for two-dimensional (2-D) discrete wavelet transform (DWT). The overall computation is decomposed into two distinct stages; where column processing is performed in stage-1, while row processing is performed in stage-2. Using a new data-access scheme and a novel folding technique,(More)
—In this paper, we present a throughput-scalable parallel and pipeline architecture for high-throughput computation of multilevel three-dimensional discrete wavelet transform (3-D DWT). The computation of 3-D DWT for each level of decomposition is split into three distinct stages, and all the three stages are implemented in parallel by a processing unit(More)
—We have analyzed memory footprint and combina-tional complexity to arrive at a systematic design strategy to derive area-delay-power-efficient architectures for two-dimensional (2-D) finite impulse response (FIR) filter. We have presented novel block-based structures for separable and non-separable filters with less memory footprint by memory sharing and(More)
In this paper, we present an efficient distributed-arithmetic (DA) formulation for the implementation of block least mean square (BLMS) algorithm. The proposed DA-based design uses a novel look-up table (LUT)-sharing technique for the computation of filter outputs and weight-increment terms of BLMS algorithm. Besides, it offers significant saving of adders(More)
— We have suggested a new data-access scheme for the computation of lifting 2-D discrete wavelet transform (DWT) without using data transposition. We have derived a linear sys-tolic array directly from the dependence graph (DG) and a two-dimensional (2-D) systolic array from a suitably segmented DG for parallel and pipeline implementation of 1-D DWT. These(More)
—In this brief, the logic operations involved in conventional carry select adder (CSLA) and binary to excess-1 converter (BEC)-based CSLA are analyzed to study the data dependence and to identify redundant logic operations. We have eliminated all the redundant logic operations present in the conventional CSLA and proposed a new logic formulation for CSLA.(More)