Giorgos Dimitrakopoulos

Learn More
— The need for efficient implementation of simple crossbar schedulers has increased in the recent years due to the advent of on-chip interconnection networks that require low latency message delivery. The core function of any crossbar scheduler is arbitration that resolves conflicting requests for the same output. Since, the delay of the arbiters directly(More)
—Single or multibit subword permutations are useful in many multimedia and cryptographic applications. Several specialized instructions have been proposed to handle the required data rearrangements. In this paper, we examine the hardware implementation of the powerful permutation instruction group (GRP). The design of the proposed permutation unit is based(More)
—In this work, we propose a new algorithm for designing diminished-1 modulo 2 n þ 1 multipliers. The implementation of the proposed algorithm requires n þ 3 partial products that are reduced by a tree architecture into two summands, which are finally added by a diminished-1 modulo 2 n þ 1 adder. The proposed multipliers, compared to existing(More)
—High-end embedded processors demand complex on-chip cache hierarchies satisfying several contradicting design requirements such as high-performance operation and low energy consumption. This paper introduces light-power (LP) nonuniform cache architecture (NUCA), a tiled-cache addressing both goals. LP-NUCA places a group of small and low-latency tiles(More)
—In this paper, a new leading-zero counter (or detector) is presented. New boolean relations for the bits of the leading-zero count are derived that allow their computation to be performed using standard carry-lookahead techniques. Using the proposed approach various design choices can be explored and different circuit topologies can be derived for the(More)
—Large systems-on-chip (SoCs) and chip multiprocessors (CMPs), incorporating tens to hundreds of cores, create a significant integration challenge. Interconnecting a huge amount of architectural modules in an efficient manner, calls for scalable solutions that would offer both high throughput and low-latency communication. The switches are the basic(More)
On-chip interconnection networks simplify the integration of complex system-on-chips. The switches are the basic building blocks of such networks and their design critically affects the performance of the whole system. The transfer of data between the inputs and the outputs of the switch is performed by the crossbar, whose active connections are decided by(More)