FaaSter Troubleshooting - Evaluating Distributed Tracing Approaches for Serverless Applications

  title={FaaSter Troubleshooting - Evaluating Distributed Tracing Approaches for Serverless Applications},
  author={Maria C. Borges and Sebastian Werner and Ahmet Kilic},
  journal={2021 IEEE International Conference on Cloud Engineering (IC2E)},
Serverless applications can be particularly difficult to troubleshoot, as these applications are often composed of various managed and partly managed services. Faults are often unpredictable and can occur at multiple points, even in simple compositions. Each additional function or service in a serverless composition introduces a new possible fault source and a new layer to obfuscate faults. Currently, serverless platforms offer only limited support for identifying runtime faults. Developers… 

Figures and Tables from this paper

Doppler: understanding serverless query execution

Doppler is demonstrated, a serverless toolkit designed to trace serverless data processing systems with minimal performance and cost overhead and to provide a deep understanding of their query execution.

Data Fusion of Observability Signals for Assisting Orchestration of Distributed Applications

A modern observability approach and pilot implementation for tackling data fusion aspects in edge and cloud computing orchestration platforms and the integration of signals made available by various open-source monitoring and observability frameworks, including metrics, logs and distributed tracing mechanisms are considered.

Serverless Computing: A Survey of Opportunities, Challenges, and Applications

This paper surveys serverless applications introduced in the literature and categorizes applications in eight domains and separately discusses the objectives and the viability of the serverless paradigm along with challenges in each of those domains.

Application-Platform Co-Design for Serverless Data Processing

An analysis of the state-of-the-art of function-as-a-service (FaaS) platforms reveals several configuration, deployment, execution, and measurement differences between popular platforms happening at-speed, and the need for engineering methods and tooling to better guide application-platform co-design is identified.



Troubleshooting Serverless functions: a combined monitoring and debugging approach

This paper presents a semi-automated troubleshooting process to improve fault detection and resolution for Serverless functions and presents a prototype SeMoDe to validate the approach for serverless functions implemented in Java and deployed to AWS Lambda.

Tracking Causal Order in AWS Lambda Applications

Serverless computing is a new cloud programming and deployment paradigm that is receiving wide-spread uptake. Serverless offerings such as Amazon Web Services (AWS) Lambda, Google Functions, and

Pivot tracing: dynamic causal monitoring for distributed systems

Pivot Tracing is a monitoring framework for distributed systems that addresses both limitations by combining dynamic instrumentation with a novel relational operator: the happened-before join and is dynamic, extensible, and enables cross-tier analysis between inter-operating applications, with low execution overhead.

Dapper, a Large-Scale Distributed Systems Tracing Infrastructure

The design of Dapper is introduced, Google’s production distributed systems tracing infrastructure is described, and how its design goals of low overhead, application-level transparency, and ubiquitous deployment on a very large scale system were met are described.

The Ifs and Buts of Less is More: A Serverless Computing Reality Check

It is argued that careful attention must be placed on the promises associated with the serverless model, a reality-check for five common assumptions is provided, and ways to mitigate unwanted effects are suggested.

An Evaluation of FaaS Platforms as a Foundation for Serverless Big Data Processing

A novel evaluation method (SIEM) is proposed to understand the impact of automatic infrastructure management on serverless big data applications remains unexplored, and new metrics to quantify quality in different big data application scenarios are introduced.

Benchmarking elasticity of FaaS platforms as a foundation for objective-driven design of serverless applications

An experiment design and corresponding toolkit for quantifying elasticity and its associated trade-offs with latency, reliability, and execution costs are presented and results for the evaluation of four popular FaaS platforms by AWS, Google, IBM, Microsoft are presented.

Serverless Computing: One Step Forward, Two Steps Back

This paper addresses critical gaps in first-generation serverless computing, which place its autoscaling potential at odds with dominant trends in modern computing: notably data-centric and distributed computing, but also open source and custom hardware.

Fault Analysis and Debugging of Microservice Systems: Industrial Survey, Benchmark System, and Empirical Study

The results show that the current industrial practices of microservice debugging can be improved by employing proper tracing and visualization techniques and strategies, and suggest that there is a strong need for more intelligent trace analysis and visualization for distributed systems.

Costradamus: A Cost-Tracing System for Cloud-Based Software Services

Costradamus is a cost-tracing system that implements a generic cost model and three different tracing approaches that can derive cost and performance information per API operation and is evaluated in a smart grid context.