Fault Analysis and Debugging of Microservice Systems: Industrial Survey, Benchmark System, and Empirical Study

  title={Fault Analysis and Debugging of Microservice Systems: Industrial Survey, Benchmark System, and Empirical Study},
  author={Xiaoping Zhou and Xin Peng and Tao Xie and Jun Sun and Chao Ji and Wenhai Li and Dan Ding},
  journal={IEEE Transactions on Software Engineering},
The complexity and dynamism of microservice systems pose unique challenges to a variety of software engineering tasks such as fault analysis and debugging. [] Key Method We then develop a medium-size benchmark microservice system (being the largest and most complex open source microservice system within our knowledge) and replicate 22 industrial fault cases on it.

Figures and Tables from this paper

Delta Debugging Microservice Systems with Parallel Optimization

The approach can effectively identify failure-inducing deltas that help diagnose the root causes of microservice failures and is scalable and efficient with the provided infrastructure resources and the designed parallel execution for optimization.

Towards a Fault Taxonomy for Microservices-Based Applications

A Multivocal Literature Review to catalog faults related to microservice-based applications to better support their development and testing and defined a taxonomy with 117 faults classified into 6 Non-Functional Requirements and related to 11 characteristics inherent to the microservices architecture.

Service-Level Fault Injection Testing

This paper presents an approach called service-level fault injection testing and a prototype implementation called Filibuster, that can be used to systematically identify resilience issues early in the development of microservice applications, and presents a corpus of 4 real-world industrial micro service applications containing bugs.

Fitness-guided Resilience Testing of Microservice-based Applications

IntelliFT, a guided resilience testing technique for microservice based applications, which aims to expose the defects in the fault-handling logic effectively within a fixed time limit, and decides whether injected faults can lead to severe failures by designing fitness-guided search technique.

Microservices Integrated Performance and Reliability Testing

This work proposes MlPaRT, a novel methodology, and platform to automatically test microservice operations for performance and reliability in combination, and applies the approach by operating the platform on an open source benchmark.

On the Nature of Issues in Five Open Source Microservices Systems: An Empirical Study

An empirical study on 1,345 issue discussions extracted from five open source microservices systems hosted on GitHub led to a first of its kind taxonomy of the types of issues in open sourcemicroservices systems, informing that the problems originating from Technical debt, Build, Security, Security and Service execution and communication are prominent.

Fuzzing Microservices In Industry: Experience of Applying EvoMaster at Meituan

This paper reports on the experience of integrating the EvoMaster tool in the testing processes at Meituan, an open-source test case generation tool for web services that exploits the latest advances in the field of Search-Based Software Testing research.

A systematic gray literature review: The technologies and concerns of microservice application programming interfaces

This article elicits the technologies and concerns on microservice APIs and establishes a microservice API description model with the intention of aiding researchers to gain an overview of this field and find possible research directions, and helping practitioners to better understand micro service APIs and be aware of the existing approaches for daily work.



10 Years of research on debugging concurrent and multicore software: a systematic mapping study

There are still quite a number of aspects that are not sufficiently covered in the field, most notably including exploring correction and fixing bugs in terms of debugging process, and the concurrent, parallel and multicore software community needs broader studies in debugging.

A Survey on Software Fault Localization

A comprehensive overview of a broad spectrum of fault localization techniques, each of which aims to streamline the fault localization process and make it more effective by attacking the problem in a unique way is provided.

Failure Diagnosis for Distributed Systems Using Targeted Fault Injection

This paper uses fault injection to populate the database of failures for a target distributed system, and shows that this approach is effective in determining the root causes, e.g., fault types and affected components, for 71-100 percent of tested failures.

Using Mutation Analysis for Assessing and Comparing Testing Coverage Criteria

This paper investigates the relative cost and effectiveness of four common control and data flow criteria by revisiting fundamental questions regarding the relationships between fault detection, test suite size, and control/data flow coverage and suggests a way to tune the mutation analysis process to possible differences in fault detection probabilities in a specific environment.

Falcon: fault localization in concurrent programs

A new dynamic fault-localization technique that can pinpoint faulty data-access patterns in multi-threaded concurrent programs and effectively and efficiently localize the faults for subjects is presented.

An architecture to automate performance tests on microservices

A new approach is presented to allow the performance tests to be executed in an automated way, with each microservice providing a test specification that is used to perform tests.

A Systematic Mapping Study in Microservice Architecture

This paper presents a systematic mapping study of microservices architectures and their implementation, focusing on identifying architectural challenges, the architectural diagrams/views and quality attributes related to microsevice systems.

Research for Practice: Tracing and Debugging Distributed Systems; Programming by Examples

This installment of Research for Practice covers two exciting topics in distributed systems and programming methodology. First, Peter Alvaro takes us on a tour of recent techniques for debugging some

Probabilistic diagnosis of performance faults in large-scale parallel applications

A novel, highly scalable tool that probabilistically infers the least progressed task in MPI programs using Markov models of execution history and dependence analysis and can isolate the root cause of a particularly perplexing bug encountered at scale in a molecular dynamics simulation.

MicroART: A Software Architecture Recovery Tool for Maintaining Microservice-Based Systems

In this paper, the first prototype of the Architecture Recovery Tool for microservice-based systems called MicroART is presented, able to generate models of the software architecture of a micro service-based system, that can be managed by software architects for multiple purposes.