Redundant Loads: A Software Inefficiency Indicator

@article{Su2019RedundantLA,
  title={Redundant Loads: A Software Inefficiency Indicator},
  author={Pengfei Su and Shasha Wen and Hailong Yang and Milind Chabbi and X. Liu},
  journal={2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)},
  year={2019},
  pages={982-993}
}
  • Pengfei Su, Shasha Wen, X. Liu
  • Published 14 February 2019
  • Computer Science
  • 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)
Modern software packages have become increasingly complex with millions of lines of code and references to many external libraries. Redundant operations are a common performance limiter in these code bases. Missed compiler optimization opportunities, inappropriate data structure and algorithm choices, and developers' inattention to performance are some common reasons for the existence of redundant operations. Developers mainly depend on compilers to eliminate redundant operations. However… 

Figures and Tables from this paper

What every scientific programmer should know about compiler optimizations?
TLDR
This paper investigates an important compiler optimization---dead and redundant operation elimination and shows that modern compilers miss several optimization opportunities, in fact they even introduce some inefficiencies, which require programmers to refactor the source code.
Pinpointing performance inefficiencies in Java
TLDR
JXPerf, a lightweight performance analysis tool for pinpointing wasteful memory operations in Java programs and optimizing several Java applications by improving code generation and choosing superior data structures and algorithms, which yield significant speedups.
ZeroSpy: Exploring Software Inefficiency with Redundant Zeros
TLDR
This paper proposes ZeroSpy - a fine-grained profiler to identify redundant zeros caused by both inappropriate use of data structures and useless computation and provides intuitive optimization guidance by revealing the locations where the redundantZeros happen in source lines and calling contexts.
BinGo: Pinpointing Concurrency Bugs in Go via Binary Analysis
TLDR
BINGO is the first tool to identify concurrency bugs in Go applications via dynamic binary analysis, an endto-end tool that is ready for deployment in the production environment with no modification on source code, compilers, and runtimes in the Go eco-system.
Analyzing memory accesses with modern processors
TLDR
This work leverages a mechanism available in modern processors to collect memory traces via hardware-based sampling and illustrates how memory traces uncover new insights into the memory access characteristics of database systems.
DRCCTPROF: A Fine-Grained Call Path Profiler for ARM-Based Clusters
  • Qidong Zhao, Xu Liu, Milind Chabbi
  • Computer Science
    SC20: International Conference for High Performance Computing, Networking, Storage and Analysis
  • 2020
TLDR
The unique ability of DRCCTPROF is to obtain full calling context at any and every machine instruction that executes, which provides more detailed diagnostic feedback for performance optimization and correctness tools.
Toward efficient interactions between Python and native libraries
TLDR
PieProf, a lightweight profiler, is developed to pinpoint interaction inefficiencies in Python applications and associate inefficiences with high-level Python code to provide a holistic view, and optimization of 17 realworld applications is guided.
CP-Detector: Using Configuration-related Performance Properties to Expose Performance Bugs
TLDR
This paper argues that the performance expectation of configuration can serve as a strong oracle candidate for performance bug detection and designed and evaluated an automated performance testing framework, CP-DETECTOR, for detecting real-world configuration-related performance bugs.
GRAPHSPY: Fused Program Semantic-Level Embedding via Graph Neural Networks for Dead Store Detection
TLDR
This work presents a novel, hybrid program embedding approach so that to derive unnecessary memory operations through the embedding, which achieves 90% of accuracy and incurs only around a half of time overhead of the state-of-art tool.
Can we trust profiling results?: understanding and fixing the inaccuracy in modern profilers
TLDR
This paper studies performance monitoring units (PMU) based statistical sampling, one of the profiling techniques widely adopted by many state-of-the-art profilers, and proposes a novel 3-step approach to understand and fix the instruction profiling inaccuracy.
...
1
2
...

References

SHOWING 1-10 OF 85 REFERENCES
Runtime Value Numbering: A Profiling Technique to Pinpoint Redundant Computations
Redundant computations can severely degrade performance in HPC applications. Redundant computations arise due to various causes such as developers' inattention to performance, inappropriate choice of
REDSPY: Exploring Value Locality in Software
TLDR
REDSPY pinpointed dramatically high volume of redundancies in programs that were optimization targets for decades, such as SPEC CPU2006 suite, Rodinia benchmark, and NWChem---a production computational chemistry code, and was able to eliminate redundancies that resulted in significant speedups.
DeadSpy: a tool to pinpoint program inefficiencies
TLDR
DeadSpy is described --- a tool that dynamically detects every dead write to memory in a given execution and provides actionable feedback to the programmer, which provides a methodical way to identify dead writes, a common symptom of performance inefficiencies.
Barrier elision for production parallel programs
TLDR
Context-sensitive dynamic optimizations that elide barriers redundant during the program execution are presented that demonstrate the value of holistic context-sensitive analyses that consider the domain science in conjunction with the associated runtime software stack.
Pinpointing and Exploiting Opportunities for Enhancing Data Reuse
  • G. Marin, J. Mellor-Crummey
  • Computer Science
    ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software
  • 2008
TLDR
An approach that uses memory reuse distance to identify an application's most significant memory access patterns causing cache misses and provide insight into ways of improving data reuse is described.
Performance problems you can fix: a dynamic analysis of memoization opportunities
TLDR
This paper presents MemoizeIt, a dynamic analysis that identifies methods that repeatedly perform the same computation, a technique called memoization, which leads to statistically significant speedups by factors between 1.04x and 12.93x.
Pin: building customized program analysis tools with dynamic instrumentation
TLDR
The goals are to provide easy-to-use, portable, transparent, and efficient instrumentation, and to illustrate Pin's versatility, two Pintools in daily use to analyze production software are described.
Performance Diagnosis for Inefficient Loops
  • Linhai Song, Shan Lu
  • Computer Science
    2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE)
  • 2017
TLDR
A static-dynamic hybrid analysis tool, LDoctor, that can provide better coverage and accuracy than existing techniques, with low overhead, and use sampling techniques to lower the run-time overhead withoutdegrading the accuracy or latency of LDoctor diagnosis.
Continuous profiling: where have all the cycles gone?
TLDR
The Digital Continuous Profiling Infrastructure is a sampling-based profiling system designed to run continuously on production systems, supporting multiprocessors, works on unmodified executables, and collects profiles for entire systems, including user programs, shared libraries, and the operating system kernel.
...
1
2
3
4
5
...