Fault Analysis and Debugging of Microservice Systems: Industrial Survey, Benchmark System, and Empirical Study

@article{Zhou2021FaultAA,
  title={Fault Analysis and Debugging of Microservice Systems: Industrial Survey, Benchmark System, and Empirical Study},
  author={Xiaoping Zhou and Xin Peng and Tao Xie and Jun Sun and Chao Ji and Wenhai Li and Dan Ding},
  journal={IEEE Transactions on Software Engineering},
  year={2021},
  volume={47},
  pages={243-260}
}
The complexity and dynamism of microservice systems pose unique challenges to a variety of software engineering tasks such as fault analysis and debugging. [...] Key Method We then develop a medium-size benchmark microservice system (being the largest and most complex open source microservice system within our knowledge) and replicate 22 industrial fault cases on it.Expand
Delta Debugging Microservice Systems with Parallel Optimization
TLDR
The approach can effectively identify failure-inducing deltas that help diagnose the root causes of microservice failures and is scalable and efficient with the provided infrastructure resources and the designed parallel execution for optimization. Expand
Service-Level Fault Injection Testing
TLDR
This paper presents an approach called service-level fault injection testing and a prototype implementation called Filibuster, that can be used to systematically identify resilience issues early in the development of microservice applications, and presents a corpus of 4 real-world industrial micro service applications containing bugs. Expand
Fitness-guided Resilience Testing of Microservice-based Applications
TLDR
IntelliFT, a guided resilience testing technique for microservice based applications, which aims to expose the defects in the fault-handling logic effectively within a fixed time limit, and decides whether injected faults can lead to severe failures by designing fitness-guided search technique. Expand
Design, Monitoring, and Testing of Microservices Systems: The Practitioners' Perspective
TLDR
The findings reveal that more research is needed to deal with microservices complexity at the design level, handle security in microservices systems, and address the monitoring and testing challenges through dedicated solutions. Expand
On the Nature of Issues in Five Open Source Microservices Systems: An Empirical Study
TLDR
An empirical study on 1,345 issue discussions extracted from five open source microservices systems hosted on GitHub led to a first of its kind taxonomy of the types of issues in open sourcemicroservices systems, informing that the problems originating from Technical debt, Build, Security, Security and Service execution and communication are prominent. Expand
Detecting anomalies in microservices with execution trace comparison
TLDR
This work proposes an anomaly detection approach for microservice application by comparing execution traces that achieves 81%–97% precision and 75%–99% recall in detecting anomalies caused by injected CPU, network, memory and service faults. Expand
A systematic gray literature review: The technologies and concerns of microservice application programming interfaces
  • Fangwei Chen, Li Zhang, Xiaoli Lian
  • Computer Science
  • Softw. Pract. Exp.
  • 2021
TLDR
This article elicits the technologies and concerns on microservice APIs and establishes a microservice API description model with the intention of aiding researchers to gain an overview of this field and find possible research directions, and helping practitioners to better understand micro service APIs and be aware of the existing approaches for daily work. Expand
Latent error prediction and fault localization for microservice applications by learning from system trace logs
TLDR
The results indicate that MEPFL can achieve high accuracy in intraapplication prediction of latent errors, faulty microservices, and fault types, and outperforms a state-of-the-art approach of failure diagnosis for distributed systems. Expand
MicroRCA: Root Cause Localization of Performance Issues in Microservices
TLDR
Experimental evaluation where common anomalies are injected to a microservice benchmark running in a Kubernetes cluster shows that MicroRCA locates root causes well, with 89% precision and 97% mean average precision, outperforming several state-of-the-art methods. Expand
On Microservice Analysis and Architecture Evolution: A Systematic Mapping Study
TLDR
This study aims to classify recently published approaches and techniques to analyze microservice systems and looks at the evolutionary perspective of such systems and their analysis, and indicates five analytical approaches commonly used in the literature towards problems classified into seven categories. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 71 REFERENCES
10 Years of research on debugging concurrent and multicore software: a systematic mapping study
TLDR
There are still quite a number of aspects that are not sufficiently covered in the field, most notably including exploring correction and fixing bugs in terms of debugging process, and the concurrent, parallel and multicore software community needs broader studies in debugging. Expand
A Survey on Software Fault Localization
TLDR
A comprehensive overview of a broad spectrum of fault localization techniques, each of which aims to streamline the fault localization process and make it more effective by attacking the problem in a unique way is provided. Expand
Failure Diagnosis for Distributed Systems Using Targeted Fault Injection
TLDR
This paper uses fault injection to populate the database of failures for a target distributed system, and shows that this approach is effective in determining the root causes, e.g., fault types and affected components, for 71-100 percent of tested failures. Expand
Using Mutation Analysis for Assessing and Comparing Testing Coverage Criteria
TLDR
This paper investigates the relative cost and effectiveness of four common control and data flow criteria by revisiting fundamental questions regarding the relationships between fault detection, test suite size, and control/data flow coverage and suggests a way to tune the mutation analysis process to possible differences in fault detection probabilities in a specific environment. Expand
GZoltar: an eclipse plug-in for testing and debugging
TLDR
A toolset for automatic testing and fault localization, dubbed GZoltar, which hosts techniques for (regression) test suite minimization and automatic fault diagnosis (namely, spectrum-based fault localization). Expand
Falcon: fault localization in concurrent programs
TLDR
A new dynamic fault-localization technique that can pinpoint faulty data-access patterns in multi-threaded concurrent programs and effectively and efficiently localize the faults for subjects is presented. Expand
An architecture to automate performance tests on microservices
TLDR
A new approach is presented to allow the performance tests to be executed in an automated way, with each microservice providing a test specification that is used to perform tests. Expand
A Systematic Mapping Study in Microservice Architecture
TLDR
This paper presents a systematic mapping study of microservices architectures and their implementation, focusing on identifying architectural challenges, the architectural diagrams/views and quality attributes related to microsevice systems. Expand
Research for Practice: Tracing and Debugging Distributed Systems; Programming by Examples
This installment of Research for Practice covers two exciting topics in distributed systems and programming methodology. First, Peter Alvaro takes us on a tour of recent techniques for debugging someExpand
Probabilistic diagnosis of performance faults in large-scale parallel applications
TLDR
A novel, highly scalable tool that probabilistically infers the least progressed task in MPI programs using Markov models of execution history and dependence analysis and can isolate the root cause of a particularly perplexing bug encountered at scale in a molecular dynamics simulation. Expand
...
1
2
3
4
5
...