Characterizing and modeling the behavior of context switch misses!

@article{Liu2008CharacterizingAM,
  title={Characterizing and modeling the behavior of context switch misses!},
  author={Fang Liu and Fei Guo and Yan Solihin and Seongbeom Kim and Abdulaziz Eker},
  journal={2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)},
  year={2008},
  pages={91-101}
}
  • F. Liu, Fei Guo, +2 authors A. Eker
  • Published 25 October 2008
  • Computer Science
  • 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)
One of the essential features in modern computer systems is context switching, which allows multiple threads of execution to time-share a limited number of processors. While very useful, context switching can introduce high performance overheads, with one of the primary reasons being the cache perturbation effect. Between the time a thread is switched out and when it resumes execution, parts of its working set in the cache may be perturbed by other interfering threads, leading to (context… Expand
Analytically modeling the memory hierarchy performance of modern processor systems
TLDR
An analytical model that reveals the mathematical relationship between cache design parameters, an application’s temporal reuse pattern, and the number of context switch misses the application suffers from is developed and validated. Expand
Reducing Migration-induced Misses in an over-Subscribed Multiprocessor System
TLDR
It is demonstrated the performance benefit of preserving a portion of L2 cache—in particular, MRU cache lines—and warming the destination L1 cache by prefetching those cache lines under different migration scenarios and observed a 1.5-27% reduction in CPI following a migration. Expand
When Misses Differ: Investigating Impact of Cache Misses on Observed Performance
  • V. Babka, L. Marek, P. Tuma
  • Computer Science
  • 2009 15th International Conference on Parallel and Distributed Systems
  • 2009
TLDR
Examination of the connection between cache sharing and observed performance in more depth on a real computer architecture shows how cache misses do not quite account for timing penalties. Expand
Modeling Cache Contention and Throughput of Multiprogrammed Manycore Processors
TLDR
An analytical model for accurately predicting the impact of contention on cache miss rates and a novel Markov chain throughput model, which shows that the models accurately predict cache contention and throughput trends across various workloads on real hardware. Expand
Cache restoration for highly partitioned virtualized systems
  • D. Daly, H. W. Cain
  • Computer Science
  • IEEE International Symposium on High-Performance Comp Architecture
  • 2012
TLDR
This work introduces cache restoration, a hardware-based prefetching mechanism initiated by the underlying virtualization software when a virtual machine is being scheduled on a core,Prefetching its working set and warming its initial environment. Expand
Extending data prefetching to cope with context switch misses
  • Hanyu Cui, S. Sair
  • Computer Science
  • 2009 IEEE International Conference on Computer Design
  • 2009
TLDR
This work proposes restoring a program's locality by prefetching into the L2 cache the data a program was using before it was swapped out, to reduce the impact of frequent context switches. Expand
Stochastic analysis of cache thrashing Analyse stochastique des fautes de cache
TLDR
A higher bound of the cache misses overhead from context switches is presented, based on a stochastic analysis of how the cache warms up, i.e. fills up with useful data, and is as close as the lower bound to the actual cache miss ratio. Expand
Reducing Migration-induced Cache Misses
  • Sajjid Reza, G. Byrd
  • Computer Science
  • 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
  • 2012
TLDR
The performance benefit of saving and restoring cached data during migration is demonstrated and an implementation that moves cached data when a thread migrates is described, and the benefits in terms of reduced misses and reduced processor cycles are shown. Expand
RECAP: A region-based cure for the common cold (cache)
TLDR
A Region-Based Cache Restoration Prefetcher (RECAP), which groups cache blocks into coarse-grain regions of memory, and predicts which regions contain useful blocks to prefetch the next time the current virtual machine executes, and provides a robust prefetcher that improves performance by up to 42% for some applications. Expand
RECAP: A region-based cure for the common cold (cache)
TLDR
A Region-Based Cache Restoration Prefetcher (RECAP), which groups cache blocks into coarse-grain regions of memory, and predicts which regions contain useful blocks to prefetch the next time the current virtual machine executes, and provides a robust prefetcher that improves performance by up to 42% for some applications. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 24 REFERENCES
The effect of context switches on cache performance
TLDR
This work fed address traces of the processes running on a multi-tasking operating system through a cache simulator, to compute accurate cache-hit rates over short intervals, and estimated the cache performance reduction caused by a context switch. Expand
An analytical model for cache replacement policy performance
TLDR
This paper is the first to propose an analytical model which predicts the performance of cache replacement policies, based on probability theory, and relies solely on the statistical properties of the application, without relying on heuristics or rules of thumbs. Expand
Effects of Multithreading on Cache Performance
TLDR
The studies with MVP show that the performance improvements are obtained not only by tolerating memory latency but also lower cache miss rates due to exploitation of data locality, and to study these issues, this paper presents the Multithreaded Virtual Processor (MVP) model. Expand
Cache performance of operating system and multiprogramming workloads
TLDR
A program tracing technique called ATUM (Address Tracing Using Microcode) is developed that captures realistic traces of multitasking workloads including the operating system that shows that both the operating System and multiprogramming activity significantly degrade cache performance, with an even greater proportional impact on large caches. Expand
Opportunities for Cache Friendly Process Scheduling
Operating system process scheduling has been an active area of research for many years. Process scheduling decisions can have a dramatic impact on capacity and conflict misses in on-chip caches,Expand
The context-switch overhead inflicted by hardware interrupts (and the enigma of do-nothing loops)
TLDR
The bizarre effect is observed on IA32, IA64, and Power processors, where one late thread holds up all its peers, causing a slowdown that is dominated by the per-node latency (numerator) and the job granularity (denominator). Expand
Footprints in the cache
TLDR
An analytical model for a cache-reload transient is developed and it is shown that the reload transient is related to the area in the tail of a normal distribution whose mean is a function of the footprints of the programs that compete for the cache. Expand
Revisiting the Cache Interference Costs of Context Switching
The high cost of context switching is one reason that operating system performance is not keeping pace with hardware improvements. Besides the cost of saving and restoring registers, anotherExpand
Analytical cache models with applications to cache partitioning
TLDR
An accurate, tractable, analytic cache model is presented, which estimates the overall cache miss-rate of a multiprocessing system with any cache size and time quanta, and is useful for both understanding the effect of context switching on caches and optimizing cache performance for time-shared systems. Expand
An analytical cache model
TLDR
An analytical cache model is developed that gives miss rates for a given trace as a function of cache size, degree of associativity, block size, subblock size, multiprogramming level, task switch interval, and observation interval. Expand
...
1
2
3
...