Angelos Molfetas

Learn More
Non-periodic bursts are prevalent in workloads of large scale applications. Existing workload models do not predict such non-periodic bursts very well because they mainly focus on repeatable base functions. We begin by showing the necessity to include bursts in workload models by investigating their detrimental effects in a petabyte-scale distributed data(More)
Performance evaluations of large-scale systems require the use of representative workloads with certifiable similar or dissimilar characteristics. To quantify the similarity of the characteristics, we describe a novel measure comprising two efficient methods that are suitable for large-scale workloads. One method uses the discrete wavelet transform to(More)
In storage systems with vast numbers of files, compression techniques should exploit of inter-file similarity, while allowing for near-atomic access to individual files. In differential compression, collections of files are compressed by identifying shared common strings. Therefore, some files are represented largely by references to strings in other files.(More)
The archiving and maintenance of vast quantities of data is a key challenge for the current use of information technology. When storing large repositories, possibly mirrored at multiple sites, an archiving system aims to reduce both storage and transmission costs. Delta compression is a key component of many archiving and backup systems. A file may be(More)
A collection of files can be compressed by storing each file in the collection as a delta file: one file refers to several other files. The copy instructions in a delta file could reference other files either in their encoded forms or in their (original) unencoded forms. Because files are stored compressed, the latter approach suffers from a blowout in the(More)