Learn More
As architecture, systems, and data management communities pay greater attention to innovative big data systems and architecture, the pressure of benchmarking and evaluating these systems rises. However, the complexity, diversity , frequently changed workloads, and rapid evolution of big data systems raise great challenges in big data bench-marking.(More)
This paper presents our joint research efforts on big data benchmarking with several industrial partners. Considering the complexity, diversity, workload churns, and rapid evolution of big data systems, we take an incre-mental approach in big data benchmarking. For the first step, we pay attention to search engines, which are the most important domain in(More)
With the explosive growth of information, more and more organizations are deploying private cloud systems or renting public cloud systems to process big data. However, there is no existing benchmark suite for evaluating cloud performance on the whole system level. To the best of our knowledge, this paper proposes the first benchmark suite CloudRank-D to(More)
The complexity and diversity of big data systems and their rapid evolution give rise to various new challenges about how we design benchmarks in order to test such systems efficiently and successfully. Data generation is a key issue in big data benchmarking that aims to generate application-specific data sets to meet the 4V requirements of big data (i.e.(More)
—As the amount of data explodes rapidly, more and more corporations are using data centers to make effective decisions and gain a competitive edge. Data analysis applications play a significant role in data centers, and hence it has became increasingly important to understand their behaviors in order to further improve the performance of data center(More)
—The basic idea behind Cloud computing is that resource providers offer elastic resources to end users. In this paper, we intend to answer one key question to the success of Cloud computing: in Cloud, can small or medium-scale scientific computing communities benefit from the economies of scale? Our research contributions are threefold: first, we propose an(More)
—Recent cost analysis shows that the server cost still dominates the total cost of high-scale data centers or cloud systems. In this paper, we argue for a new twist on the classical resource provisioning problem: heterogeneous workloads are a fact of life in large-scale data centers, and current resource provisioning solutions do not act upon this(More)
Search is the most heavily used web application in the world and is still growing at an extraordinary rate. Understanding the behaviors of web search engines, therefore, is becoming increasingly important to the design and deployment of data center systems hosting search engines. In this paper, we study three search query traces collected from real world(More)
Cloud computing, which is advocated as an economic platform for daily computing, has become a hot topic for both industrial and academic communities in the last couple of years. The basic idea behind cloud computing is that resource providers, which own the cloud platform, offer elastic resources to end users. In this paper, we intend to answer one key(More)