Learn More
Understanding the inherent system characteristics is crucial to the design and optimization of cloud storage system, and few studies have systematically investigated its data characteristics and access patterns. This paper presents an analysis of file system snapshot and five-month access trace of a campus cloud storage system that has been deployed on(More)
Global climate modeling not only requires computation capabilities, but also brings tough challenges for data storage systems. The input and output data sets generally require hundreds or even thousands of terabytes storage. Therefore, storage reduction methods, such as content deduplication and various data compression methods, are extremely important for(More)
Climate modeling data are usually multidimensional arrays of floating-point numbers. These arrays typically have two or three spatial dimensions and one temporal dimension, describing the evolvement of climate variables in a time span. With the advances of high performance computing, the volume of climate data is expanding exponentially, bringing tough(More)
With the rapid advances in supercomputing and numerical simulations, the output data of scientific computing is expanding rapidly, bringing tough challenges for data sharing and data archiving. Data compression can mitigate these challenges by reducing the size of the data to be stored or transferred. However, data compression has to achieve a good balance(More)
Multidimensional arrays are commonly used in scientific and engineering applications. The disk layout for the multidimensional arrays will obviously affect the performance of data querying. Homogeneous Replica method are widely used to maintain the data reliability in most of the distributed storage systems and used to improve the data locality in some(More)
Climate data have been dramatically increasing in volume in recent years. This huge volume of climate data poses considerable challenges for data storage, archiving and sharing. In this paper, we propose a lossless compression algorithm for climate data, named czip. We efficiently eliminate data redundancy through several new methods, including adaptive(More)
  • 1