Modeling the Linux page cache for accurate simulation of data-intensive applications

  title={Modeling the Linux page cache for accurate simulation of data-intensive applications},
  author={Hoang-Dung Do and Val{\'e}rie Hayot-Sasson and Rafael Ferreira da Silva and Christopher Steele and Henri Casanova and Tristan Glatard},
  journal={2021 IEEE International Conference on Cluster Computing (CLUSTER)},
The emergence of Big Data in recent years has resulted in a growing need for efficient data processing solutions. While infrastructures with sufficient compute power are available, the I/O bottleneck remains. The Linux page cache is an efficient approach to reduce I/O overheads, but few experimental studies of its interactions with Big Data applications exist, partly due to limitations of real-world experiments. Simulation is a popular approach to address these issues, however, existing… 

Figures and Tables from this paper


Adding Storage Simulation Capacities to the SimGrid Toolkit: Concepts, Models, and API
This paper details the extension of SimGrid, a versatile toolkit for the simulation of large-scale distributed computing systems, with storage simulation capacities, and defines the required abstractions and proposes a new API to handle storage components and their contents in SimGrid-based simulators.
Versatile, scalable, and accurate simulation of distributed applications and platforms
Toward Better Simulation of MPI Applications on Ethernet/TCP Networks
This study shows that SMPI has a consistently better predictive power than classical LogP-based models for a wide range of scenarios including both established HPC benchmarks and real applications.
On the validity of flow-level tcp network models for grid and cloud simulations
Evaluating state-of-the-art flow-level network models of TCP communication via comparison to packet-level simulation shows that model validation cannot be achieved solely by exhibiting “good cases,” and improves upon all previously proposed models in the context of simulation of grids or clouds.
Optorsim: A Grid Simulator for Studying Dynamic Data Replication Strategies
This paper details the design and implementation of OptorSim and analyze various replication algorithms based on different Grid workloads, and provides a modular framework within which optimization strategies can be studied under different Grid configurations.
Triple-H: A Hybrid Approach to Accelerate HDFS on HPC Clusters with Heterogeneous Storage Architecture
This paper presents a hybrid design (Triple-H) that can minimize the I/O bottlenecks in HDFS and ensure efficient utilization of the heterogeneous storage devices available on HPC clusters and improves the write and read throughputs of HDFS.
Predicting the Energy-Consumption of MPI Applications at Scale Using Only a Single Node
The proposed simulator is validated through an extensive set of experiments with wellknown HPC benchmarks, and it is shown the simulator can be used to study applications at scale, which allows researchers to save both time and resources compared to real experiments.
iCanCloud: A Flexible and Scalable Cloud Infrastructure Simulator
The iCanCloud simulator is introduced and validates, a novel simulator of cloud infrastructures with remarkable features such as flexibility, scalability, performance and usability, targeted to conduct large experiments.
Simulating MPI Applications: The SMPI Approach
This article summarizes our recent work and developments on SMPI, a flexible simulator of MPI applications. In this tool, we took a particular care to ensure our simulator could be used to produce
Fostering Energy-Awareness in Simulations behind Scientific Workflow Management Systems
Techniques for unifying two existing simulation toolkits are introduced by first analysing the problems with the current simulators, and then by illustrating the problems faced by workflow systems through the example of the ASKALON environment.