Effect of Thread Level Parallelism on the Performance of Optimum Architecture for Embedded Applications
As multi-core trends are becoming dominant, cache structures are complicated and bigger shared level-2 caches are demanded. Also, in mobile processors, multi-core design is being applied. To achieve higher cache performance, lower power consumption and smaller chip area in multi-core mobile processors, cache configuration should be re-organized and re-analyzed. The MID (Mobile Internet Devices) which are embedding mobile processors are becoming one of major platforms and demanding to have a capability to run more general-purpose workload in new platforms (eg., Netbook). In this paper, we proposed a novel cache mechanism to provide performance improvement without increasing cache memory size. Most of applications (workloads) have spatial locality in cache behaviors which means small boundary of cache locations tend to be used in a given piece of time. Considering this concept of locality reversely, logically farthest sets will have relatively lower correlation in terms of locality. The possibility that these two sets are used in same basic block would be very low. With this observation, we investigate the feasibility of sharing two sets of cache blocks for data fill and replacement within a cache. By sharing the sets, certain amount of acceptable performance improvement could be expected without increasing cache size. Based on our simulation with sampled SPEC CPU2000 workloads, the proposed cache mechanism shows average reduction in cache miss rate up to 8.5% (depending on cache size and baseline set associativity), compared to the baseline cache.