Giorgos Dimitrakopoulos

Learn More
—Single or multibit subword permutations are useful in many multimedia and cryptographic applications. Several specialized instructions have been proposed to handle the required data rearrangements. In this paper, we examine the hardware implementation of the powerful permutation instruction group (GRP). The design of the proposed permutation unit is based(More)
— The need for efficient implementation of simple crossbar schedulers has increased in the recent years due to the advent of on-chip interconnection networks that require low latency message delivery. The core function of any crossbar scheduler is arbitration that resolves conflicting requests for the same output. Since, the delay of the arbiters directly(More)
—Large systems-on-chip (SoCs) and chip multiprocessors (CMPs), incorporating tens to hundreds of cores, create a significant integration challenge. Interconnecting a huge amount of architectural modules in an efficient manner, calls for scalable solutions that would offer both high throughput and low-latency communication. The switches are the basic(More)
—High-end embedded processors demand complex on-chip cache hierarchies satisfying several contradicting design requirements such as high-performance operation and low energy consumption. This paper introduces light-power (LP) nonuniform cache architecture (NUCA), a tiled-cache addressing both goals. LP-NUCA places a group of small and low-latency tiles(More)
On-chip interconnection networks simplify the integration of complex system-on-chips. The switches are the basic building blocks of such networks and their design critically affects the performance of the whole system. The transfer of data between the inputs and the outputs of the switch is performed by the crossbar, whose active connections are decided by(More)
—In this paper, a new leading-zero counter (or detector) is presented. New boolean relations for the bits of the leading-zero count are derived that allow their computation to be performed using standard carry-lookahead techniques. Using the proposed approach various design choices can be explored and different circuit topologies can be derived for the(More)
—A novel hardware algorithm, a VLSI architecture, and an optimization methodology for residue multipliers are introduced in this paper. The proposed design approach identifies certain properties of the bit products that participate in the residue product computation and subsequently exploits them to reduce the complexity of the implementation. A set of(More)