Coz: finding code that counts with causal profiling
@article{Curtsinger2015CozFC, title={Coz: finding code that counts with causal profiling}, author={Charlie Curtsinger and E. Berger}, journal={Proceedings of the 25th Symposium on Operating Systems Principles}, year={2015} }
Improving performance is a central concern for software developers. To locate optimization opportunities, developers rely on software profilers. However, these profilers only report where programs spent their time: optimizing that code may have no impact on performance. Past profilers thus both waste developer time and make it difficult for them to uncover significant optimization opportunities. This paper introduces causal profiling. Unlike past profiling approaches, causal profiling indicates…
Figures and Tables from this paper
63 Citations
DProf: distributed profiler with strong guarantees
- Computer ScienceProc. ACM Program. Lang.
- 2019
A new timestamp synchronization algorithm, FreeZer, is developed that tightly bounds the inaccuracy in a converted timestamp to a time interval and is used to implement dCSP and dCOZ that are accuracy bounded distributed versions of Context Sensitive Profiles and Causal Profiles developed for shared memory systems.
Iceberg: a tool for static analysis of Java critical sections
- Computer ScienceSOAP@PLDI
- 2016
A static analysis tool called Iceberg that helps programmers find potential performance bugs in concurrent Java programs, and is able to find critical sections with unusual behavior compared to the other critical sections.
APT-GET: profile-guided timely software prefetching
- Computer ScienceEuroSys
- 2022
This work designs APT-GET, a novel profile-guided technique that ensures prefetch timeliness by leveraging dynamic execution time information and introduces a novel analytical model to find the optimal prefetch-distance and prefetch injection site based on the collected profile to enable timely prefetches.
Distributed Latency Profiling through Critical Path Tracing
- Computer ScienceACM Queue
- 2022
Scalable and accurate fine-grain tracing has made Critical Path Tracing the standard approach for distributed latency analysis for many Google applications, including Google Search.
PerFlow
- Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
- 2022
SCOZ: A system‐wide causal profiler for multicore systems
- Computer ScienceSoftw. Pract. Exp.
- 2021
SCOZ is introduced, a system‐wide causal profiler that addresses limitations of COZ and changes the target of virtual speedup from threads to CPU cores, thereby expanding the profiling coverage to diverse applications as well as OS kernel codes.
Swarmbug: debugging configuration bugs in swarm robotics
- Computer ScienceESEC/SIGSOFT FSE
- 2021
Swarmbug, a swarm debugging system that automatically diagnoses and fixes buggy behaviors caused by misconfiguration, and automatically generates, validates, and ranks fixes for configuration bugs, which is evaluated on four diverse swarm algorithms.
TIP: Time-Proportional Instruction Profiling
- Computer ScienceMICRO
- 2021
Time-Proportional Instruction Profiling (TIP) is proposed which combines Oracle’s time attribution policies with statistical sampling to enable practical implementation and is implemented within the Berkeley Out-of-Order Machine and finds that TIP is highly accurate.
adPerf: Characterizing the Performance of Third-party Ads
- Computer ScienceProc. ACM Meas. Anal. Comput. Syst.
- 2021
An in-depth and first-of-a-kind performance evaluation of web ads and develops an infrastructure, adPerf, for the Chrome browser that classifies page loading workloads into ad-related and main-content at the granularity of browser activities (such as Javascript and Layout).
References
SHOWING 1-10 OF 42 REFERENCES
Continuously measuring critical section pressure with the free-lunch profiler
- Computer ScienceOOPSLA 2014
- 2014
Free Lunch is proposed, a new lock profiler for Java application servers, specifically designed to identify, in-vivo, phases where the progress of the threads is impeded by a lock.
ParaShares: Finding the Important Basic Blocks in Multithreaded Programs
- Computer ScienceEuro-Par
- 2014
This work ignores any underlying pathologies, and focuses instead on pinpointing the exact locations in source code that consume the largest share of execution, resulting in a new metric, ParaShares, that scores and ranks all basic blocks in a program based on their share of parallel execution.
Statistical debugging for real-world performance problems
- Computer ScienceOOPSLA 2014
- 2014
This study conducts an empirical study to understand how performance problems are observed and reported by real-world users and shows that statistical debugging is a natural fit for diagnosing performance problems, which are often observed through comparison-based approaches and reported together with both good and bad inputs.
Bottle graphs: visualizing scalability bottlenecks in multi-threaded applications
- Computer ScienceOOPSLA 2013
- 2013
Understanding and analyzing multi-threaded program performance and scalability is far from trivial, which severely complicates parallel software development and optimization. In this paper, we…
Instant Profiling: Instrumentation Sampling for Profiling Datacenter Applications
- Computer Science
- 2013
Instant profiling: Instrumentation sampling for profiling datacenter applications
- Computer ScienceProceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)
- 2013
Instant profiling is presented, an instrumentation sampling technique using dynamic binary translation that periodically interleaves native execution and instrumented execution according to configurable profiling duration and frequency parameters and is well-suited for deployment to datacenters.
STABILIZER: statistically sound performance evaluation
- Computer ScienceASPLOS '13
- 2013
Stabilizer is presented, a system that enables the use of the powerful statistical techniques required for sound performance evaluation on modern architectures and its efficiency and effectiveness are demonstrated by evaluating the impact of LLVM's optimizations on the SPEC CPU2006 benchmark suite.
Bottleneck identification and scheduling in multithreaded applications
- Computer Science, BusinessASPLOS XVII
- 2012
Bottleneck Identification and Scheduling in Multithreaded Applications (BIS), a cooperative software-hardware mechanism to identify and accelerate the most critical bottlenecks regardless of their type, is proposed.
Harmony: Collection and analysis of parallel block vectors
- Computer Science2012 39th Annual International Symposium on Computer Architecture (ISCA)
- 2012
This paper applies parallel block vectors to uncover several novel insights about parallel applications with direct consequences for architectural design, including that the serial and parallel phases of execution used in Amdahl's Law are often composed of many of the same basic blocks.
Evaluation and optimization of multicore performance bottlenecks in supercomputing applications
- Computer Science(IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE
- 2011
Traditional unicore metrics are examined and how they can be misleading in a multicore system is demonstrated and performance bottlenecks specific to multicore-based systems are examined.