Constantinos E. Goutis

Learn More
In this paper the three main hardware architectures for the two-dimensional discrete wavelet transform (2D-DWT) are reviewed. Also optimization techniques applicable to all three architectures are described. The main contribution of this work is the quantitative comparison among these design alternatives for the 2D-DWT. The comparison is performed in terms(More)
In this paper we study the performance improvements and trade-offs derived from an optimized mapping approach applied on a parametric coarse grained reconfigurable array architecture. The processing elements' local register files and the processing elements' interconnection network is exploited for caching memory data values with data reuse opportunities.(More)
Exploitation of data re-use in combination with the use of custom memory hierarchy that exploits the temporal locality of data accesses may introduce significant power savings, especially for data-intensive applications. The effect of the data-reuse decisions on the power dissipation but also on area and performance of multimedia applications realized on(More)
—Application studies in the areas of image and video processing indicate that between 50 and 80% of the power cost in these systems is due to data storage and transfers. This is especially true for multi-processor re-alizations, because conventional parallelization methods ignore the power cost and focus only on performance. However, also the power(More)
A high-performance data-path to implement DSP kernels is introduced in this paper. The data-path is realized by a Flexible Computational Component (FCC), which is a pure combinational circuit and it can implement any 2x2 template (cluster) of primitive resources. Thus, the data-path's performance benefits from the intra-component chaining of operations. Due(More)
In this paper, the hardware implementations of five representative stream ciphers are compared in terms of performance and consumed area in an FPGA device. The ciphers used for the comparison are the A5/1, W7, E0, RC4 and Helix. The first three ones have been used for the security part of well-known standards, especially wireless communication protocols.(More)
It is widely known that parallel operation execution in multiprocessor systems generates a respective increase in memory accesses. Since the memory and bus subsystems provide a limited access bandwidth, the applications performance cannot be that high as the multiprocessor system capabilities promise. This is the case for the 2-Dimensional coarse-grained(More)