• Publications
  • Influence
BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking
Data generation is a key issue in big data benchmarking that aims to generate application-specific data sets to meet the 4 V requirements of big data. Specifically, big data generators need toExpand
  • 91
  • 7
  • PDF
In Cloud, Can Scientific Communities Benefit from the Economies of Scale?
The basic idea behind cloud computing is that resource providers offer elastic resources to end users. In this paper, we intend to answer one key question to the success of cloud computing: in cloud,Expand
  • 159
  • 6
  • PDF
CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications
With the explosive growth of information, more and more organizations are deploying private cloud systems or renting public cloud systems to process big data. However, there is no existing benchmarkExpand
  • 81
  • 6
  • PDF
Characterizing data analysis workloads in data centers
As the amount of data explodes rapidly, more and more corporations are using data centers to make effective decisions and gain a competitive edge. Data analysis applications play a significant roleExpand
  • 110
  • 5
  • PDF
Characterizing and subsetting big data workloads
  • Zhen Jia, J. Zhan, +5 authors J. Li
  • Computer Science
  • IEEE International Symposium on Workload…
  • 1 September 2014
Big data benchmark suites must include a diversity of data and workloads to be useful in fairly evaluating big data systems and architectures. However, using truly comprehensive benchmarks posesExpand
  • 68
  • 5
  • PDF
BigDataBench: a Big Data Benchmark Suite from Web Search Engines
This paper presents our joint research efforts on big data benchmarking with several industrial partners. Considering the complexity, diversity, workload churns, and rapid evolution of big dataExpand
  • 55
  • 4
  • PDF
Benchmarking Big Data Systems: A Review
With the fast development of big data systems in recent years, a variety of open-source benchmarks have been built to evaluate and compare the workloads on these systems, and to promote theirExpand
  • 27
  • 4
Cost-Aware Cooperative Resource Provisioning for Heterogeneous Workloads in Data Centers
Recent cost analysis shows that the server cost still dominates the total cost of high-scale data centers or cloud systems. In this paper, we argue for a new twist on the classical resourceExpand
  • 63
  • 2
  • PDF
CVR: efficient vectorization of SpMV on x86 processors
Sparse Matrix-vector Multiplication (SpMV) is an important computation kernel widely used in HPC and data centers. The irregularity of SpMV is a well-known challenge that limits SpMV’s parallelismExpand
  • 24
  • 2
  • PDF
Performance analysis and optimization of MPI collective operations on multi-core clusters
Memory hierarchy on multi-core clusters has twofold characteristics: vertical memory hierarchy and horizontal memory hierarchy. This paper proposes new parallel computation model to unitedly abstractExpand
  • 31
  • 2
  • PDF