Coz: finding code that counts with causal profiling

@article{Curtsinger2015CozFC,
  title={Coz: finding code that counts with causal profiling},
  author={Charlie Curtsinger and E. Berger},
  journal={Proceedings of the 25th Symposium on Operating Systems Principles},
  year={2015}
}
Improving performance is a central concern for software developers. To locate optimization opportunities, developers rely on software profilers. However, these profilers only report where programs spent their time: optimizing that code may have no impact on performance. Past profilers thus both waste developer time and make it difficult for them to uncover significant optimization opportunities. This paper introduces causal profiling. Unlike past profiling approaches, causal profiling indicates… 

Figures and Tables from this paper

DProf: distributed profiler with strong guarantees
TLDR
A new timestamp synchronization algorithm, FreeZer, is developed that tightly bounds the inaccuracy in a converted timestamp to a time interval and is used to implement dCSP and dCOZ that are accuracy bounded distributed versions of Context Sensitive Profiles and Causal Profiles developed for shared memory systems.
Iceberg: a tool for static analysis of Java critical sections
TLDR
A static analysis tool called Iceberg that helps programmers find potential performance bugs in concurrent Java programs, and is able to find critical sections with unusual behavior compared to the other critical sections.
APT-GET: profile-guided timely software prefetching
TLDR
This work designs APT-GET, a novel profile-guided technique that ensures prefetch timeliness by leveraging dynamic execution time information and introduces a novel analytical model to find the optimal prefetch-distance and prefetch injection site based on the collected profile to enable timely prefetches.
Distributed Latency Profiling through Critical Path Tracing
TLDR
Scalable and accurate fine-grain tracing has made Critical Path Tracing the standard approach for distributed latency analysis for many Google applications, including Google Search.
PerFlow
Unicorn
SCOZ: A system‐wide causal profiler for multicore systems
TLDR
SCOZ is introduced, a system‐wide causal profiler that addresses limitations of COZ and changes the target of virtual speedup from threads to CPU cores, thereby expanding the profiling coverage to diverse applications as well as OS kernel codes.
Swarmbug: debugging configuration bugs in swarm robotics
TLDR
Swarmbug, a swarm debugging system that automatically diagnoses and fixes buggy behaviors caused by misconfiguration, and automatically generates, validates, and ranks fixes for configuration bugs, which is evaluated on four diverse swarm algorithms.
TIP: Time-Proportional Instruction Profiling
TLDR
Time-Proportional Instruction Profiling (TIP) is proposed which combines Oracle’s time attribution policies with statistical sampling to enable practical implementation and is implemented within the Berkeley Out-of-Order Machine and finds that TIP is highly accurate.
adPerf: Characterizing the Performance of Third-party Ads
TLDR
An in-depth and first-of-a-kind performance evaluation of web ads and develops an infrastructure, adPerf, for the Chrome browser that classifies page loading workloads into ad-related and main-content at the granularity of browser activities (such as Javascript and Layout).
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 42 REFERENCES
Continuously measuring critical section pressure with the free-lunch profiler
TLDR
Free Lunch is proposed, a new lock profiler for Java application servers, specifically designed to identify, in-vivo, phases where the progress of the threads is impeded by a lock.
ParaShares: Finding the Important Basic Blocks in Multithreaded Programs
TLDR
This work ignores any underlying pathologies, and focuses instead on pinpointing the exact locations in source code that consume the largest share of execution, resulting in a new metric, ParaShares, that scores and ranks all basic blocks in a program based on their share of parallel execution.
Statistical debugging for real-world performance problems
TLDR
This study conducts an empirical study to understand how performance problems are observed and reported by real-world users and shows that statistical debugging is a natural fit for diagnosing performance problems, which are often observed through comparison-based approaches and reported together with both good and bad inputs.
Bottle graphs: visualizing scalability bottlenecks in multi-threaded applications
Understanding and analyzing multi-threaded program performance and scalability is far from trivial, which severely complicates parallel software development and optimization. In this paper, we
Instant profiling: Instrumentation sampling for profiling datacenter applications
TLDR
Instant profiling is presented, an instrumentation sampling technique using dynamic binary translation that periodically interleaves native execution and instrumented execution according to configurable profiling duration and frequency parameters and is well-suited for deployment to datacenters.
STABILIZER: statistically sound performance evaluation
TLDR
Stabilizer is presented, a system that enables the use of the powerful statistical techniques required for sound performance evaluation on modern architectures and its efficiency and effectiveness are demonstrated by evaluating the impact of LLVM's optimizations on the SPEC CPU2006 benchmark suite.
Bottleneck identification and scheduling in multithreaded applications
TLDR
Bottleneck Identification and Scheduling in Multithreaded Applications (BIS), a cooperative software-hardware mechanism to identify and accelerate the most critical bottlenecks regardless of their type, is proposed.
Harmony: Collection and analysis of parallel block vectors
TLDR
This paper applies parallel block vectors to uncover several novel insights about parallel applications with direct consequences for architectural design, including that the serial and parallel phases of execution used in Amdahl's Law are often composed of many of the same basic blocks.
Evaluation and optimization of multicore performance bottlenecks in supercomputing applications
TLDR
Traditional unicore metrics are examined and how they can be misleading in a multicore system is demonstrated and performance bottlenecks specific to multicore-based systems are examined.
...
1
2
3
4
5
...