Learn More
As architecture, systems, and data management communities pay greater attention to innovative big data systems and architecture, the pressure of benchmarking and evaluating these systems rises. However, the complexity, diversity , frequently changed workloads, and rapid evolution of big data systems raise great challenges in big data bench-marking.(More)
With the explosive growth of information, more and more organizations are deploying private cloud systems or renting public cloud systems to process big data. However, there is no existing benchmark suite for evaluating cloud performance on the whole system level. To the best of our knowledge, this paper proposes the first benchmark suite CloudRank-D to(More)
The complexity and diversity of big data systems and their rapid evolution give rise to various new challenges about how we design benchmarks in order to test such systems efficiently and successfully. Data generation is a key issue in big data benchmarking that aims to generate application-specific data sets to meet the 4V requirements of big data (i.e.(More)
This paper presents our joint research efforts on big data benchmarking with several industrial partners. Considering the complexity, diversity, workload churns, and rapid evolution of big data systems, we take an incre-mental approach in big data benchmarking. For the first step, we pay attention to search engines, which are the most important domain in(More)
—The basic idea behind Cloud computing is that resource providers offer elastic resources to end users. In this paper, we intend to answer one key question to the success of Cloud computing: in Cloud, can small or medium-scale scientific computing communities benefit from the economies of scale? Our research contributions are threefold: first, we propose an(More)
—As the amount of data explodes rapidly, more and more corporations are using data centers to make effective decisions and gain a competitive edge. Data analysis applications play a significant role in data centers, and hence it has became increasingly important to understand their behaviors in order to further improve the performance of data center(More)
—As more and more multi-tier services are developed from commercial off-the-shelf components or heterogeneous middleware without source code available, both developers and administrators need a request tracing tool to (1) exactly know how a user request of interest travels through services of black boxes; (2) obtain macro-level user request behavior(More)
—Big data benchmark suites must include a diversity of data and workloads to be useful in fairly evaluating big data systems and architectures. However, using truly comprehensive benchmarks poses great challenges for the architecture community. First, we need to thoroughly understand the behaviors of a variety of workloads. Second, our usual(More)
—Recent cost analysis shows that the server cost still dominates the total cost of high-scale data centers or cloud systems. In this paper, we argue for a new twist on the classical resource provisioning problem: heterogeneous workloads are a fact of life in large-scale data centers, and current resource provisioning solutions do not act upon this(More)