Learn More
Visualizations are arguably the most important tool to explore, understand and convey facts about data. As part of interactive data exploration, visualizations might be used to quickly skim through the data and look for patterns. Unfortunately, database systems are not designed to efficiently support these workloads. As a result, visualizations often take(More)
Memory-based data center applications require increasingly large memory capacities, but face the challenges posed by the inherent difficulties in scaling DRAM and also the cost of DRAM. Future systems are attempting to address these demands with heterogeneous memory architectures coupling DRAM with high capacity, low cost, but also lower performance,(More)
Have you ever been in a sauna? If yes, according to our recent survey conducted on Amazon Mechanical Turk, people who go to saunas are more likely to know that Mike Stonebraker is not a character in " The Simpsons ". While this result clearly makes no sense, recently proposed tools to automatically suggest visualizations, correlations, or perform visual(More)
Exploring data via visualization has become a popular way to understand complex data. Features or patterns in visualization can be perceived as relevant insights by users, even though they may actually arise from random noise. Moreover, interactive data exploration and visualization recommendation tools can examine a large number of observations, and(More)
Existing benchmarks for analytical database systems such as TPC-DS and TPC-H are designed for static reporting scenarios. The main metric of these benchmarks is the performance of running different SQL queries over a predefined database. In this paper, we argue that such benchmarks are not suitable for evaluating modern interactive data exploration (IDE)(More)
Recent tools for interactive data exploration significantly increase the chance that users make false discoveries. They allow users to (visually) examine many hypotheses and make inference with simple interactions, and thus incur the issue commonly known in statistics as the "<i>multiple hypothesis testing error</i>." In this work, we propose a solution to(More)
Apache Spark is a popular framework for data analytics with attractive features such as fault tolerance and interoperabil-ity with the Hadoop ecosystem. Unfortunately, many an-alytics operations in Spark are an order of magnitude or more slower compared to native implementations written with high performance computing tools such as MPI. There is a need to(More)
  • 1