Chein-Wei Jen

Learn More
ÐThis paper presents a design methodology for high-speed Booth encoded parallel multiplier. For partial product generation, we propose a new modified Booth encoding (MBE) scheme to improve the performance of traditional MBE schemes. For final addition, a new algorithm is developed to construct multiple-level conditional-sum adder (MLCSMA). The proposed(More)
This work explores the data reuse properties of fullsearch block-matching (FSBM) for motion estimation (ME) and associated architecture designs, as well as memory bandwidth requirements. Memory bandwidth in high-quality video is a major bottleneck to designing an implementable architecture because of large frame size and search range. First, memory(More)
This paper presents a novel split-radix fast Fourier transform (SRFFT) pipeline architecture design. A mapping methodology has been developed to obtain regular and modular pipeline for split-radix algorithm. The pipeline is repartitioned to balance the latency between complex multiplication and butterfly operation by using carry-save addition. The number of(More)
A new approach to derive a systolic algorithm for primelength discrete cosine transform (DCT) is proposed. It makes use of the input/output (UO) data permutations and the symmetry property of cosine kernels such that the proposed array possesses outstanding E ( i A ) * = B,?. (4.2) performance in hardware cost of the processing elements (PE’s), average(More)
The ongoing advancements in VLSI technology allow system-on-a-chip (SoC) design to integrate heterogeneous control and computing functions into a single chip. On the other hand, the pressures of area and cost lead to the requirement for a single, shared off-chip DRAM memory subsystem. To satisfy different memory access requirements for latency and bandwidth(More)
This paper presents a memory-efficient approach to realize the cyclic convolution and its application to the discrete cosine transform (DCT). We adopt the way of distributed arithmetic (DA) computation, exploit the symmetry property of DCT coefficients to merge the elements in the matrix of DCT kernel, separate the kernel to be two perfect cyclic forms, and(More)
In this paper, we propose a new algorithm based on Montgomery’s algorithm to calculate modular multiplication that is the core arithmetic operation in an RSA cryptosystem. The modified algorithm eliminates over-large residue and has very short critical path delay that yields a very high-speed processing. The new architecture based on this modified algorithm(More)