Simulating Big Data Clusters for System Planning, Evaluation, and Optimization
Simulating contemporary computer systems is a challenging endeavor, especially when it comes to simulating high-end setups involving multiple servers. The simulation environment needs to run complete software stacks, including operating systems, middleware, and application software, and it needs to simulate network and disk activity next to CPU performance. In addition, it needs the ability to scale out to a large number of server nodes while attaining good accuracy and reasonable simulation speeds. This paper presents VSim, a novel simulation methodology for multi-server systems. VSim leverages virtualization technology for simulating a target system on a host system. VSim controls CPU, network and disk performance on the host, and it gives the illusion to the software stack to run on a target system through time dilation. VSim can simulate multiple targets per host, and it employs a distributed simulation scheme across multiple hosts for simulations at scale. Our experimental results demonstrate VSim's accuracy: typical errors are below 6% for CPU, disk, and network performance. Real-life workloads involving the Lucene search engine and the Olio Web 2.0 benchmark illustrate VSim's utility and accuracy (average error of 3.2%). Our current setup can simulate up to five target servers per host, and we provide a Hadoop workload case study in which we simulate 25 servers. These simulation results are obtained at a simulation slowdown of one order of magnitude compared to native hardware execution.