Learn More
Scheduling data processing workflows (dataflows) on the cloud is a very complex and challenging task. It is essentially an optimization problem, very similar to query optimization, that is characteristically different from traditional problems in two aspects: Its space of alternative schedules is very rich, due to various optimization opportunities that(More)
Modern frameworks, such as Hadoop, combined with abundance of computing resources from the cloud, offer a significant opportunity to address long standing challenges in distributed processing. Infrastructure-as-a-Service clouds reduce the investment cost of renting a large data center while distributed processing frameworks are capable of efficiently(More)
Complex on-demand data retrieval and processing is a characteristic of several applications and combines the notions of querying & search, information filtering & retrieval, data transformation & analysis, and other data manipulations. Such rich tasks are typically represented by data processing graphs, having arbitrary data operators as nodes and their(More)
Implementations of database operators on GPU processors have shown dramatic performance improvement compared to multicore-CPU implementations. GPU threads can cooperate using shared memory, which is organized in interleaved banks and is fast only when threads read and modify addresses belonging to distinct memory banks. Therefore, data processing operators(More)
Given recent increases in the size of main memory in modern machines, it is now common to to store large data sets in RAM for faster processing. Multidimensional access methods aim to provide efficient access to large data sets when queries apply predicates to some of the data dimensions. We examine multidimensional access methods in the context of an(More)
AITION is a scalable, user-friendly, and interactive data mining (DM) platform, designed for analyzing large heterogeneous datasets. Implementing state-of-the-art machine learning algorithms, it successfully utilizes generative Prob-abilistic Graphical Models (PGMs) providing an integrated framework targeting feature selection, Knowledge Discovery (KD), and(More)
Implementations of relational operators on GPU processors have resulted in order of magnitude speedups compared to their multicore CPU counterparts. Here we focus on the efficient implementation of string matching operators common in SQL queries. Due to different architectural features the optimal algorithm for CPUs might be suboptimal for GPUs. GPUs(More)
Today's exponentially increasing data volumes and the high cost of storage make compression essential for the Big Data industry. Although research has concentrated on efficient compression, fast decompression is critical for analytics queries that repeatedly read compressed data. While decompression can be parallelized somewhat by assigning each data block(More)