Learn More
Scheduling data processing workflows (dataflows) on the cloud is a very complex and challenging task. It is essentially an optimization problem, very similar to query optimization, that is characteristically different from traditional problems in two aspects: Its space of alternative schedules is very rich, due to various optimization opportunities that(More)
We investigate the performance of some of the best-known object clustering algorithms on four different workloads based upon the tektronix benchmark. For all four workloads, stochastic clustering gave the best performance for a variety of performance metrics. Since stochastic clustering is computationally expensive, it is interesting that for every workload(More)
Object clustering has long been recognized as important to the performance of object bases, but in most work to date, it is not clear exactly what is being optimized or how optimal are the solutions obtained. We give a rigorous treatment of a fundamental problem in clustering: given an object base and a probabilistic description of the expected access(More)
Complex on-demand data retrieval and processing is a characteristic of several applications and combines the notions of querying & search, information filtering & retrieval, data transformation & analysis, and other data manipulations. Such rich tasks are typically represented by data processing graphs, having arbitrary data operators as nodes and their(More)
Software development kits (SDKs) and supporting tools for Graphics Processor Units (GPUs) have matured and they now enable the implementation of complex middleware that takes advantage of the additional processing power. Working in synergy with CPUs, GPUs are suitable for executing highly parallelized tasks on streams of data. In this paper, we investigate(More)
AITION is a scalable, user-friendly, and interactive data mining (DM) platform, designed for analyzing large heterogeneous datasets. Implementing state-of-the-art machine learning algorithms, it successfully utilizes generative Prob-abilistic Graphical Models (PGMs) providing an integrated framework targeting feature selection, Knowledge Discovery (KD), and(More)