• Corpus ID: 52985428

REPT: Reverse Debugging of Failures in Deployed Software

@inproceedings{Cui2018REPTRD,
  title={REPT: Reverse Debugging of Failures in Deployed Software},
  author={Weidong Cui and Xinyang Ge and Baris Kasikci and Ben Niu and Upamanyu Sharma and Ruoyu Wang and Insu Yun},
  booktitle={OSDI},
  year={2018}
}
Debugging software failures in deployed systems is important because they impact real users and customers. [] Key Method REPT tackles these challenges by constructing a partial execution order based on timestamps logged by hardware and iteratively performing forward and backward execution with error correction. We design and implement REPT, deploy it on Microsoft Windows, and integrate it into WinDbg. We evaluate REPT on 16 real-world bugs and show that it can recover data values accurately (92% on…

Figures and Tables from this paper

Reverse Debugging of Kernel Failures in Deployed Systems
TLDR
Kernel REPT is the first practical reverse debugging solution for kernel failures that is highly efficient, imposes small memory footprint and requires no extra software layer, and can proactively identify kernel bugs by checking the reconstructed execution history against a set of predetermined invariants.
Postmortem accurate IR-level state recovery for deployed concurrent programs
TLDR
STRAB (State Recovery at Abstract-level), a collection of proposed methods to solve debugging failures of deployed concurrent software, has significantly higher accuracy compared to REPT at IR-level with only minor slowdowns, while also achieving architecture-independence.
STRAB: state recovery using reverse execution at IR level for concurrent programs
TLDR
Experimental results on a variety of real-world concurrent programs show that STRAB has significantly higher accuracy compared to REPT at IR-level (+40% on average) with only minor slowdowns (x2.7 on average), while also achieving architecture-independence.
WATCHER: in-situ failure diagnosis
TLDR
A novel diagnosis system that can pinpoint root causes of program failures within the failing process ("in-situ"), eliminating the privacy concern is presented and two optimizations to reduce the diagnosis time and diagnose failures with control flow hijacks are proposed.
Automated Bug Hunting With Data-Driven Symbolic Root Cause Analysis
TLDR
This work proposes bug hunting using symbolically reconstructed states based on execution traces to achieve better detection and root cause analysis of overflow, use-after-free, double free, and format string bugs across user programs and their imported libraries.
Execution reconstruction: harnessing failure reoccurrences for failure reproduction
TLDR
Execution Reconstruction is proposed, a technique that strikes a better balance between efficiency, effectiveness and accuracy for reproducing production failures and reproduces fully replayable executions that can power a variety of debugging and reliabilty use cases.
Ad hoc Test Generation Through Binary Rewriting
TLDR
This work builds on record-replay and binary rewriting to automatically generate and run targeted tests for candidate patches significantly faster and more efficiently than traditional test suite generation techniques like symbolic execution.
POMP++: Facilitating Postmortem Program Diagnosis with Value-Set Analysis
TLDR
POMP++ can accurately and efficiently pinpoint program statements that truly contribute to the crashes, making failure diagnosis significantly convenient and reducing the execution time by 60% compared with existing reverse execution.
RoBin: Facilitating the Reproduction of Configuration-Related Vulnerability
TLDR
RoBin is implemented, a binary similarity-based building configuration inference tool to infer the specific building configurations via the binary from crash report that can help developers reproduce and diagnose the vulnerability, and finally, patch the programs.
Testing Configuration Changes in Context to Prevent Production Failures
TLDR
The idea behind ctests is simple—connecting production system configurations to software tests so that configuration changes can be tested in the context of code affected by the changes, and it effectively detects real-world failure-inducing configuration changes, diverse injected mis configurationurations and misconfigurations in the deployed files.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 38 REFERENCES
Cooperative Bug Isolation
TLDR
A suite of new algorithms for statistical debugging: finding and fixing software errors based on statistical analysis of sparse feedback data is presented, from simple process of elimination strategies to regression techniques that build models of suspect program behaviors as failure predictors.
BugNet: continuously recording program execution for deterministic replay debugging
TLDR
The proposed BugNet architecture provides the ability to replay an application's execution across context switches and interrupts, which obviates the need for tracking program I/O, interrupts and DMA transfers, which would have otherwise required more complex hardware support.
Leveraging the short-term memory of hardware to diagnose production-run software failures
TLDR
This paper designs a low overhead, low latency, privacy preserving production-run failure diagnosis system based on two observations: first, short-term memory of program execution is often sufficient for failure diagnosis, as many bugs have short propagation distances; and second, maintaining a short- term memory of execution is much cheaper than maintaining a record of the whole execution.
RETracer: Triaging Crashes by Reverse Execution from Partial Memory Dumps
TLDR
RETracer is presented, the first system to triage software crashes based on program semantics reconstructed from memory dumps, and it is found that RETracer eliminates two thirds of triage errors based on a manual analysis of 140 bugs fixed in Microsoft Windows and Office.
Production-run software failure diagnosis via hardware performance counters
TLDR
PBI can effectively diagnose failures caused by sequential and concurrency bugs with a small overhead that is never higher than 10%.
Execution synthesis: a technique for automated software debugging
TLDR
ESD--a debugger based on execution synthesis--is evaluated on popular software and reproduces on its own several real concurrency and memory safety bugs in less than three minutes, thus incurring no runtime overhead and being practical for use in production systems.
Postmortem Program Analysis with Hardware-Enhanced Post-Crash Artifacts
TLDR
It is shown that, POMP can accurately and efficiently pinpoint program statements that truly pertain to the crashes, making failure diagnosis significantly convenient.
PSE: explaining program failures via postmortem static analysis
TLDR
PSE (Postmortem Symbolic Evaluation), a static analysis algorithm that can be used by programmers to diagnose software failures, is described, which combines a novel dataflow analysis and memory alias analysis in a manner that allows for precise exploration of the program's behavior in polynomial time.
Instrumentation and sampling strategies for cooperative concurrency bug isolation
TLDR
This work presents Cooperative Crug Isolation (CCI), a low-overhead instrumentation framework to diagnose production-run failures caused by crugs, and offers a varied suite of predicates that represent different trade-offs between complexity and fault isolation capability.
Failure sketching: a technique for automated root cause diagnosis of in-production failures
TLDR
Gist, a prototype for failure sketching that relies on hardware watchpoints and a new hardware feature for extracting control flow traces (Intel Processor Trace), is built and it is shown that Gist can build failure sketches with low overhead for failures in systems like Apache, SQLite, and Memcached.
...
1
2
3
4
...