Learn More
Applications ranging from algorithmic trading to scientific data analysis require real-time analytics based on views over databases receiving thousands of updates each second. Such views have to be kept fresh at millisecond latencies. At the same time, these views have to support classical SQL, rather than window semantics, to enable applications that(More)
We discuss a multi-objective/goal programming model for the allocation of inventory of graphical advertisements. The model considers two types of campaigns: guaranteed delivery (GD), which are sold months in advance, and non-guaranteed delivery (NGD), which are sold using real-time auctions. We investigate various advertiser and publisher objectives such as(More)
Security modifications to legacy network protocols are expensive and disruptive. This paper outlines an approach, based on external security monitors, for securing legacy protocols by deploying additional hosts that locally monitor the inputs and outputs of each host executing the protocol , check the behavior of the host against a safety specification ,(More)
This paper calls for a new breed of lightweight systems – dynamic data management systems (DDMS). In a nutshell, a DDMS manages large dynamic data structures with agile , frequently fresh views, and provides a facility for monitoring these views and triggering application-level events. We motivate DDMS with applications in large-scale data analytics,(More)
Three mentalities have emerged in analytics. One view holds that reliable analytics is impossible without high-quality data, and relies on heavy-duty ETL processes and upfront data curation to provide it. The second view takes a more ad-hoc approach, collecting data into a data lake, and placing responsibility for data quality on the analyst querying it. A(More)
—This paper presents PigOut, a system that enables federated data processing over multiple Hadoop clusters. Using PigOut, a user (such as a data analyst) can write a single script in a high-level language to efficiently use multiple Hadoop clusters. There is no need to manually write multiple scripts and coordinate the execution for different clusters.(More)
Probabilistic databases, in particular ones that allow users to externally define models or probability distributions -- so called VG-Functions -- are an ideal tool for constructing, simulating and analyzing hypothetical business scenarios. Enterprises often use such tools with parameterized models and need to explore a large parameter space in order to(More)
Embedded database engines such as SQLite provide a convenient data persistence layer and have spread along with the applications using them to many types of systems, including interactive devices such as smartphones. Android, the most widely-distributed smart-phone platform, both uses SQLite internally and provides interfaces encouraging apps to use SQLite(More)