Learn More
—Parallel-prefix adders offer a highly efficient solution to the binary addition problem and are well-suited for VLSI implementations. In this paper, a novel framework is introduced, which allows the design of parallel-prefix Ling adders. The proposed approach saves one-logic level of implementation compared to the parallel-prefix structures proposed for(More)
—Large systems-on-chip (SoCs) and chip multiprocessors (CMPs), incorporating tens to hundreds of cores, create a significant integration challenge. Interconnecting a huge amount of architectural modules in an efficient manner, calls for scalable solutions that would offer both high throughput and low-latency communication. The switches are the basic(More)
—High-end embedded processors demand complex on-chip cache hierarchies satisfying several contradicting design requirements such as high-performance operation and low energy consumption. This paper introduces light-power (LP) nonuniform cache architecture (NUCA), a tiled-cache addressing both goals. LP-NUCA places a group of small and low-latency tiles(More)
The efficiency of modern Networks-on-Chip (NoC) is no longer judged solely by their physical scalability, but also by their ability to deliver high performance, Quality-of-Service (QoS), and flow isolation at the minimum possible cost. Although traditional architectures supporting Virtual Channels (VC) offer the resources for flow partitioning and(More)
The need for efficient implementation of simple crossbar schedulers has increased in the recent years due to the advent of on-chip interconnection networks that require low latency message delivery. The core function of any crossbar scheduler is arbitration that resolves conflicting requests for the same output. Since, the delay of the arbiters directly(More)
—In this work, we propose a new algorithm for designing diminished-1 modulo 2 n þ 1 multipliers. The implementation of the proposed algorithm requires n þ 3 partial products that are reduced by a tree architecture into two summands, which are finally added by a diminished-1 modulo 2 n þ 1 adder. The proposed multipliers, compared to existing(More)
Bufferless switches can be an attractive and energy-efficient design option for on-chip networks when network utilization is low and low-latency operation matters the most. However, this promising design option is limited by the complexity of the control logic required to operate a bufferless switch that imposes large delays and limits the clock frequency.(More)