Konstantinos Karanasos

Learn More
Large organizations today operate data centers around the globe where massive amounts of data are produced and consumed by local users. Despite their geographically diverse origin, such data must be analyzed/mined as a whole. We call the problem of supporting rich DAGs of computation across geographically distributed data Wide-Area Big-Data (WABD). To the(More)
We consider the setting of a Semantic Web database, containing both explicit data encoded in RDF triples, and implicit data, implied by the RDF semantics. Based on a query workload, we address the problem of selecting a set of views to be materialized in the database, minimizing a combination of query processing, view storage, and view maintenance costs.(More)
Datacenter-scale computing for analytics workloads is increasingly common. High operational costs force heterogeneous applications to share cluster resources for achieving economy of scale. Scheduling such large and diverse workloads is inherently hard, and existing approaches tackle this in two alternative ways: 1) centralized solutions offer strict,(More)
Job scheduling in Big Data clusters is crucial both for cluster operators' return on investment and for overall user experience. In this context, we observe several anomalies in how modern cluster schedulers manage queues, and argue that maintaining queues of tasks at worker nodes has significant benefits. On one hand, centralized approaches do not use(More)
We consider the problem of rewriting XQuery queries using multiple materialized XQuery views. The XQuery dialect we use to express views and queries corresponds to tree patterns (returning data from several nodes, at different granularities, ranging from node identifiers to full XML subtrees) with value joins. We provide correct and complete algorithms for(More)
Enterprises are adapting large-scale data processing platforms, such as Hadoop, to gain actionable insights from their "big data". Query optimization is still an open challenge in this environment due to the volume and heterogeneity of data, comprising both structured and un/semi-structured datasets. Moreover, it has become common practice to push business(More)
The increasing interest in the RDF data model has turned the efficient processing of queries over RDF datasets to a challenging and crucial task. Indeed, the triple format of the RDF data model, along with the lack of structure that characterizes it, raise new challenges in data management both in terms of performance and scalability. In this paper, we(More)
We consider the problem of efficiently sharing large volumes of XML data based on distributed hash table overlay networks. Over the last three years, we have built ViP2P (standing for Views in Peerto-Peer), a platform for the distributed, parallel dissemination of XML data among peers. At the core of ViP2P stand distributed materialized XML views, defined(More)
Dans des systèmes d’abonnements basés sur le contenu, les utilisateurs expriment leurs intérêts par des requêtes sur les flux de publications. Le passage à l’échelle des systèmes d’abonnements pose de nombreux problèmes de performance: les utilisateurs sont intéressés par la fraîcheur des données, c’est à dire, obtenir les résultats de leurs abonnements le(More)