Giorgos Dimitrakopoulos

Learn More
—Parallel-prefix adders offer a highly efficient solution to the binary addition problem and are well-suited for VLSI implementations. In this paper, a novel framework is introduced, which allows the design of parallel-prefix Ling adders. The proposed approach saves one-logic level of implementation compared to the parallel-prefix structures proposed for(More)
—Large systems-on-chip (SoCs) and chip multiprocessors (CMPs), incorporating tens to hundreds of cores, create a significant integration challenge. Interconnecting a huge amount of architectural modules in an efficient manner, calls for scalable solutions that would offer both high throughput and low-latency communication. The switches are the basic(More)
—High-end embedded processors demand complex on-chip cache hierarchies satisfying several contradicting design requirements such as high-performance operation and low energy consumption. This paper introduces light-power (LP) nonuniform cache architecture (NUCA), a tiled-cache addressing both goals. LP-NUCA places a group of small and low-latency tiles(More)
—In this work, we propose a new algorithm for designing diminished-1 modulo 2 n þ 1 multipliers. The implementation of the proposed algorithm requires n þ 3 partial products that are reduced by a tree architecture into two summands, which are finally added by a diminished-1 modulo 2 n þ 1 adder. The proposed multipliers, compared to existing(More)
The need for efficient implementation of simple crossbar schedulers has increased in the recent years due to the advent of on-chip interconnection networks that require low latency message delivery. The core function of any crossbar scheduler is arbitration that resolves conflicting requests for the same output. Since, the delay of the arbiters directly(More)
Bufferless switches can be an attractive and energy-efficient design option for on-chip networks when network utilization is low and low-latency operation matters the most. However, this promising design option is limited by the complexity of the control logic required to operate a bufferless switch that imposes large delays and limits the clock frequency.(More)
Two architectures for parallel-prefix modulo 2 n − 1 adders are presented in this paper. For large wordlengths we introduce the sparse modulo 2 n − 1 adders that achieve significant reduction of the wiring complexity without imposing any delay penalty. Then, the Ling-carry formulation of modulo 2 n − 1 addition is presented. Ling modulo adders save one(More)