Learn More
The demand for low-power embedded systems requires designers to tune processor parameters to avoid excessive energy wastage. Tuning on a per-application or per-application-phase basis allows a greater saving in energy consumption without a noticeable degradation in performance. On-chip caches often consume a significant fraction of the total energy budget(More)
Intelligently partitioning the last-level cache within a chip multiprocessor can bring significant performance improvements. Resources are given to the applications that can benefit most from them, restricting each core to a number of logical cache ways. However, although overall performance is increased, existing schemes fail to consider energy saving when(More)
A major problem in determining the composition of plant cell wall polysaccharides has been the lack of a suitable method for accurately determining the amounts of galacturonic and glucuronic acids in such polymers. A gas chromatographic method for aldose analysis has been extended to include uronic acids. Cell wall polysaccharides are depolymerized by acid(More)
The microarchitectural design space of a new processor is too large for an architect to evaluate in its entirety. Even with the use of statistical simulation, evaluation of a single configuration can take excessive time due to the need to run a set of benchmarks with realistic workloads. This paper proposes a novel machine learning model that can quickly(More)
Adaptive micro architectures are a promising solution for designing high-performance, power-efficient microprocessors. They offer the ability to tailor computational resources to the specific requirements of different programs or program phases. They have the potential to adapt the hardware cost-effectively at runtime to any application's needs. However,(More)
Register pressure in modern superscalar processors can be reduced by releasing registers early and by copying their contents to cheap back-up storage. This article quantifies the potential benefits of register occupancy reduction and shows that existing hardware-based schemes typically achieve only a small fraction of this potential. This is because they(More)
In recent years, high performance computing systems have obtained more processing cores and share a last level cache (LLC). However, as their number grows, the core-to-way ratio in the LLC increases, presenting problems to existing cache partitioning techniques which require more ways than cores. Furthermore, effective energy management of the LLC becomes(More)
We describe and evaluate HELIX, a new technique for automatic loop parallelization that assigns successive iterations of a loop to separate threads. We show that the inter-thread communication costs forced by loop-carried data dependences can be mitigated by code optimization, by using an effective heuristic for selecting loops to parallelize, and by using(More)
Building an optimising compiler is a difficult and time consuming task which must be repeated for each generation of a microprocessor. As the underlying microarchitecture changes from one generation to the next, the compiler must be retuned to optimise specifically for that new system. It may take several releases of the compiler to effectively exploit a(More)