Learn More
As architecture, systems, and data management communities pay greater attention to innovative big data systems and architecture, the pressure of benchmarking and evaluating these systems rises. However, the complexity, diversity, frequently changed workloads, and rapid evolution of big data systems raise great challenges in big data benchmarking.(More)
This paper presents our joint research efforts on big data benchmarking with several industrial partners. Considering the complexity, diversity, workload churns, and rapid evolution of big data systems, we take an incremental approach in big data benchmarking. For the first step, we pay attention to search engines, which are the most important domain in(More)
The complexity and diversity of big data systems and their rapid evolution give rise to various new challenges about how we design benchmarks in order to test such systems efficiently and successfully. Data generation is a key issue in big data benchmarking that aims to generate application-specific data sets to meet the 4V requirements of big data (i.e.(More)
With the explosive growth of information, more and more organizations are deploying private cloud systems or renting public cloud systems to process big data. However, there is no existing benchmark suite for evaluating cloud performance on the whole system level. To the best of our knowledge, this paper proposes the first benchmark suite CloudRank-D to(More)
Big data benchmark suites must include a diversity of data and workloads to be useful in fairly evaluating big data systems and architectures. However, using truly comprehensive benchmarks poses great challenges for the architecture community. First, we need to thoroughly understand the behaviors of a variety of workloads. Second, our usual simulation-based(More)
Traditionally, multi-layer neural networks use dot product between the output vector of previous layer and the incoming weight vector as the input to activation function. The result of dot product is unbounded, thus increases the risk of large variance. Large variance of neuron makes the model sensitive to the change of input distribution, thus results in(More)
As the amount of data explodes rapidly, more and more corporations are using data centers to make effective decisions and gain a competitive edge. Data analysis applications play a significant role in data centers, and hence it has became increasingly important to understand their behaviors in order to further improve the performance of data center computer(More)
For the first time, this paper systematically identifies three categories of throughput oriented workloads in data centers: services, data processing applications, and interactive real-time applications, whose targets are to increase the volume of throughput in terms of processed requests or data, or supported maximum number of simultaneous subscribers,(More)
Big data benchmarking is particularly important and provides applicable yardsticks for evaluating booming big data systems. However, wide coverage and great complexity of big data computing impose big challenges on big data benchmarking. How can we construct a benchmark suite using a minimum set of units of computation to represent diversity of big data(More)
Big data analytics workloads are very significant ones in modern data centers, and it is more and more important to characterize their representative workloads and understand their behaviors so as to improve the performance of data center computer systems. In this paper, we embark on a comprehensive study to understand the impacts and performance(More)