• Corpus ID: 50371709

Distributed-system Features and Challenges Shiviz Is a New Distributed System Debugging Visualization Tool Debugging Distributed Systems

@inproceedings{Beschastnikh2016DistributedsystemFA,
  title={Distributed-system Features and Challenges Shiviz Is a New Distributed System Debugging Visualization Tool Debugging Distributed Systems},
  author={Ivan Beschastnikh and Patty Wang and Yuriy Brun and Michael D. Ernst},
  year={2016}
}
D istributed systems pose unique challenges for software developers. Reasoning about concurrent activities of system nodes and even understanding the system’s communication topology can be difficult. A standard approach to gaining insight into system activity is to analyze system logs. Unfortunately, this can be a tedious and complex process. This article looks at several key features and debugging challenges that differentiate distributed systems from other kinds of software. The article… 

Figures from this paper

References

SHOWING 1-10 OF 18 REFERENCES
D3S: Debugging Deployed Distributed Systems
TLDR
D3S is a checker that allows developers to specify predicates on distributed properties of a deployed system, and that checks these predicates while the system is running, and can detect non-trivial correctness and performance bugs at runtime and with low performance overhead.
Dapper, a Large-Scale Distributed Systems Tracing Infrastructure
TLDR
The design of Dapper is introduced, Google’s production distributed systems tracing infrastructure is described, and how its design goals of low overhead, application-level transparency, and ubiquitous deployment on a very large scale system were met are described.
Friday: Global Comprehension for Distributed Replay
TLDR
Friday, a system for debugging distributed applications that combines deterministic replay of components with the power of symbolic, low-level debugging and a simple language for expressing higher-level distributed conditions and actions, is presented.
IronFleet: proving practical distributed systems correct
TLDR
A methodology for building practical and provably correct distributed systems based on a unique blend of TLA-style state-machine refinement and Hoare-logic verification is described, which proves that each obeys a concise safety specification, as well as desirable liveness requirements.
Pivot tracing: dynamic causal monitoring for distributed systems
TLDR
Pivot Tracing is a monitoring framework for distributed systems that addresses both limitations by combining dynamic instrumentation with a novel relational operator: the happened-before join and is dynamic, extensible, and enables cross-tier analysis between inter-operating applications, with low execution overhead.
So , youwant to trace your distributed system ? Key design insights from years of practical experience
TLDR
Drawing upon experiences building and using end-to-end tracing infrastructures, this paper distills the key design axes that dictate trace utility for important use cases and identifies the remaining challenges on the path to making tracing an integral part of distributed system design.
Verdi: a framework for implementing and formally verifying distributed systems
TLDR
Verdi, a framework for implementing and formally verifying distributed systems in Coq, formalizes various network semantics with different faults, and enables the developer to first verify their system under an idealized fault model then transfer the resulting correctness guarantees to a more realistic fault model without any additional proof burden.
Theia: Visual Signatures for Problem Diagnosis in Large Hadoop Clusters
TLDR
Theia is described, a visualization tool that analyzes application-level logs in a Hadoop cluster, and generates visual signatures of each job's performance that provide compact representations of task durations, task status, and data consumption by jobs.
Spanner: Google's Globally-Distributed Database
TLDR
This article describes how Spanner is structured, its feature set, the rationale underlying various design decisions, and a novel time API that exposes clock uncertainty, critical to supporting external consistency and a variety of powerful features.
MODIST: Transparent Model Checking of Unmodified Distributed Systems
TLDR
Most importantly, MODIST found protocol-level bugs (i.e., flaws in the core distributed protocols) in every system checked: 10 in total, including 2 in Berkeley DB, 2 in MPS, and 6 in PACIFICA.
...
...