Dimitris Tsirogiannis

Learn More
Cloudera Impala is a modern, open-source MPP SQL engine architected from the ground up for the Hadoop data processing environment. Impala provides low latency and high concurrency for BI/analytic read-mostly queries on Hadoop, not delivered by batch frameworks such as Apache Hive. This paper presents Impala from a user’s perspective, gives an overview of(More)
The problem of obtaining efficient answers to top-<i>k</i> queries has attracted a lot of research attention. Several algorithms and numerous variants of the top-<i>k</i> retrieval problem have been introduced in recent years. The general form of this problem requests the <i>k</i> highest ranked values from a relation, using monotone combining functions on(More)
Rising energy costs in large data centers are driving an agenda for energy-efficient computing. In this paper, we focus on the role of database software in affecting, and, ultimately, improving the energy efficiency of a server. We first characterize the power-use profiles of database operators under different configuration parameters. We find that common(More)
Solid state drives perform random reads more than 100x faster than traditional magnetic hard disks, while offering comparable sequential read and write bandwidth. Because of their potential to speed up applications, as well as their reduced power consumption, these new drives are expected to gradually replace hard disks as the primary permanent storage(More)
Energy is a growing component of the operational cost for many “big data” deployments, and hence has become increasingly important for practitioners of large-scale data analysis who require scale-out clusters or parallel DBMS appliances. Although a number of recent studies have investigated the energy efficiency of DBMSs, none of these studies have looked(More)
SIGMOD has offered, since 2008, to verify the experiments published in the papers accepted at the conference. This year, we have been in charge of reproducing the experiments provided by the authors (repeatability), and exploring changes to experiment parameters (workability). In this paper, we assess the SIGMOD repeatability process in terms of(More)
Flash memory affects not only storage options but also query processing. In this paper, we analyze the use of flash memory for database query processing, including algorithms that combine flash memory and traditional disk drives. We first focus on flash-resident databases and present data structures and algorithms that leverage the fast random reads of(More)
Scaling complex transactional workloads in parallel and distributed systems is a challenging problem. When transactions span data partitions that reside in different nodes, significant overheads emerge that limit the throughput of these systems. In this paper, we present a low-overhead data partitioning approach, termed JECB, that can reduce the number of(More)
  • 1