Solving Big Data Challenges for Enterprise Application Performance Management

@article{Rabl2012SolvingBD,
  title={Solving Big Data Challenges for Enterprise Application Performance Management},
  author={Tilmann Rabl and Mohammad Sadoghi and Hans-Arno Jacobsen and Sergio G{\'o}mez-Villamor and Victor Munt{\'e}s-Mulero and Serge Mankowskii},
  journal={Proc. VLDB Endow.},
  year={2012},
  volume={5},
  pages={1724-1735}
}
As the complexity of enterprise systems increases, the need for monitoring and analyzing such systems also grows. A number of companies have built sophisticated monitoring tools that go far beyond simple resource utilization reports. For example, based on instrumentation and specialized APIs, it is now possible to monitor single method invocations and trace individual transactions across geographically distributed systems. This high-level of detail enables more precise forms of analysis and… 

Poster: MADES - a multi-layered, adaptive, distributed event store

TLDR
A massively distributed store for collecting, querying, and storing event data at a rate of millions of events per second, designed to address APM's highly constrained resource budget.

A comprehensive evaluation of NoSQL datastores in the context of historians and sensor data analysis

TLDR
This study of two NoSQL datastores, HBase and Cassandra, provides the required insights for business units to choose the right technology for their next generation historian systems.

Storage Mining: Where IT Management Meets Big Data Analytics

TLDR
This paper presents the on-going research thrust of designing novel IT management solutions by leveraging big data analytics frameworks, and introduces the project of Storage Mining, which exploits big data Analytics techniques to facilitate storage cloud management.

Toward Scalable Systems for Big Data Analytics: A Technology Tutorial

TLDR
This paper presents a systematic framework to decompose big data systems into four sequential modules, namely data generation, data acquisition, data storage, and data analytics, and presents the prevalent Hadoop framework for addressing big data challenges.

Scaling Machine Learning Methods to Big Data Systems

TLDR
This project investigates how to implement existing Machine Learning frameworks, such as WEKA, with existing Big Data Management System, e.g. AsterixDB to enable Machine Learning and Data Mining methods on a large scale, both for historical, stored data, and for streaming, real-time data.

Decreasing the Management Burden in Multi-tier Systems Through Partial Correlation-Based Monitoring

TLDR
This work proposes three novel strategies based on partial correlation, a statistical tool commonly employed to summarize the relevant information of complex systems, that allow the construction of a monitoring network with less metrics than a state-of-the-art solution while achieving larger fault coverage.

Comparative Study for Load Management of HBase and Cassandra Distributed Databases in Big Data

TLDR
Experimental results showed that HBase can provide better performance as the number of connections increase in the presence of horizontal scalability, which is one of the important features of NoSQL systems.

An Evaluation of Cassandra for Hadoop

TLDR
This paper presents a thorough evaluation of the Cassandra NoSQL database when used in conjunction with the Hadoop MapReduce engine and characterize the performance for a wide range of representative use cases, and then compare, contrast, and evaluate.

A Hybrid Approach to Dynamic Enterprise Data Platform

TLDR
This solution can be integrated with new data sources very quickly and reduces the amount of time for data integration, preprocessing, deduplication and entity mapping by using open source software components.
...

References

SHOWING 1-10 OF 33 REFERENCES

The ganglia distributed monitoring system: design, implementation, and experience

Dapper, a Large-Scale Distributed Systems Tracing Infrastructure

TLDR
The design of Dapper is introduced, Google’s production distributed systems tracing infrastructure is described, and how its design goals of low overhead, application-level transparency, and ubiquitous deployment on a very large scale system were met are described.

Dynamo: amazon's highly available key-value store

TLDR
D Dynamo is presented, a highly available key-value storage system that some of Amazon's core services use to provide an "always-on" experience and makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.

Scalable SQL and NoSQL data stores

TLDR
This paper examines a number of SQL and socalled "NoSQL" data stores designed to scale simple OLTP-style application loads over many servers, and contrasts the new systems on their data model, consistency mechanisms, storage mechanisms, durability guarantees, availability, query support, and other dimensions.

Cassandra: a decentralized structured storage system

Cassandra is a distributed storage system for managing very large amounts of structured data spread out across many commodity servers, while providing highly available service with no single point of

Performance Evaluation of Range Queries in Key Value Stores

TLDR
This paper compares Cassandra, HBase and Voldemort in terms of their support for different types of query workloads, mainly focused on the range queries, and shows that there are trade-offs in the performance of the selected system and scheme, and the types of the queries that can be processed efficiently.

YCSB++: benchmarking and performance debugging advanced features in scalable table stores

TLDR
YCSB++ is described, a set of extensions to the Yahoo! Cloud Serving Benchmark that includes multi-tester coordination for increased load and eventual consistency measurement, multi-phase workloads to quantify the consequences of work deferment and the benefits of anticipatory configuration optimization, and abstract APIs for explicit incorporation of advanced features in benchmark tests.

H-store: a high-performance, distributed main memory transaction processing system

TLDR
The demonstration presented here provides insight on the development of a distributed main memory OLTP database and allows for the further study of the challenges inherent in this operating environment.

Bigtable: A Distributed Storage System for Structured Data

TLDR
The simple data model provided by Bigtable is described, which gives clients dynamic control over data layout and format, and the design and implementation of Bigtable are described.

PNUTS: Yahoo!'s hosted data serving platform

TLDR
PNUTS provides data storage organized as hashed or ordered tables, low latency for large numbers of concurrent requests including updates and queries, and novel per-record consistency guarantees and utilizes automated load-balancing and failover to reduce operational complexity.