Learn More
A growing body of work has compiled a strong case for the single-ISA heterogeneous multi-core paradigm. A single-ISA heterogeneous multi-core provides multiple, differently-designed superscalar core types that can streamline the execution of diverse programs and program phases. No prior research has addressed the 'Achilles' heel of this paradigm: design and(More)
When several applications are co-scheduled to run on a system with multiple shared LLCs, there is opportunity to improve system performance. This opportunity can be exploited by the hardware, software, or a combination of both hardware and software. The software, i.e., an operating system or hypervisor, can improve system performance by co-scheduling jobs(More)
The centralized structures necessary for the extraction of instruction-level parallelism (ILP) are consuming progressively smaller portions of the total die area of chip multiprocessors (CMP). The reason for this is that scaling these structures does not enhance general performance as much as scaling the cache and interconnect. However, the fact that these(More)
Several models of deterministic routing have been proposed for wormhole-routed mesh networks while there is only one model, to the best of our knowledge, proposed for fully adaptive wormhole routing in mesh interconnection networks. The paper proposes a new analytical performance model of fully adaptive wormhole-routed mesh networks with high accuracy.(More)
The OTIS-hypercube is an interesting class of the optoelectronic OTIS architecture for interconnection networks. In the OTIS architecture, optical connections are used to connect distant processors while closer processors are connected electronically. In this paper, we propose an adaptive routing algorithm for the wormhole switched OTIS-hypercube. We then(More)
The OTIS-hypercube is an optoelectronic architecture for interconnecting the processing nodes of a multiprocessor system. In this paper, an empirical performance evaluation of the OTIS-hypercube is conducted for different traffic patterns and routing algorithms. It is shown that, depending on the traffic pattern, minimal path routing may not have the best(More)
This paper presents results showing that workload behavior tends to vary considerably at granularities of less than a thousand instructions. If it were possible to adjust the microarchitecture to suit the workload behavior at such rates, significant single-thread performance enhancement would be achievable. However, previous techniques are too sluggish to(More)
Although the best processor design for executing a specific workload does depend on the characteristics of the workload, it can not be determined without factoring-in the effect of the interdependencies between different architectural subcomponents. Consequently, workload characteristics alone do not provide accurate indication of which workloads can(More)