Stuart Biles

Learn More
Instruction set customization is an effective way to improve processor performance. Critical portions of applicationdata-flow graphs are collapsed for accelerated execution on specialized hardware. Collapsing dataflow subgraphs will compress the latency along critical paths and reduces the number of intermediate results stored in the register file. While(More)
The design trend of caches in modern processors continues to increase their capacity with higher associativity to cope with large data footprint and take advantage of feature size shrink, which, unfortunately, also leads to higher energy consumption. This paper presents a technique using segmented counting Bloom filters called "Way Guard" to reduce the(More)
Virtual caches are employed as L1 caches of both high performance and embedded processors to meet their short latency requirements. However, they also introduce the synonym problem where the same physical cache line can be present at multiple locations in the cache due to their distinct virtual addresses, leading to potential data consistency issues. To(More)
As applications tend to grow more complex and use more memory, the demand for cache space increases. Thus embedded processors are inclined to use larger caches. Predicting a miss in a long-latency cache becomes crucial in an embedded system-on-chip(SOC) platform to perform microarchitecture-level energy management. Counting Bloom filters are simple and fast(More)
This paper focuses on the instruction fetch resources in a real-time SMT processor to provide an energy-efficient configuration for a soft real-time application running as a high priority thread as fast as possible while still offering decent progress in low priority or non-realtime thread(s). We propose a fetch mechanism, Fetch-around, where a high(More)
In this paper, we propose two low-cost and novel branch history buffer handling schemes aiming at skewing the branch prediction accuracy in favor of a real-time thread for a soft real-time embedded multithreaded processor. The processor core accommodates two running threads, one with the highest priority and the other thread is a background thread, and both(More)
  • 1