Jeffrey Jestes

Learn More
While Processing-in-Memory has been investigated for decades, it has not been embraced commercially. A number of emerging technologies have renewed interest in this topic. In particular, the emergence of 3D stacking and the imminent release of Micron's Hybrid Memory Cube device have made it more practical to move computation near memory. However, the(More)
In data mining applications and spatial and multimedia databases, a useful tool is the <i>k</i>NN join, which is to produce the <i>k</i> nearest neighbors (NN), from a dataset <i>S</i>, of every point in a dataset <i>R</i>. Since it involves both the join and the NN search, performing <i>k</i>NN joins efficiently is a challenging task. Meanwhile,(More)
Recently, there have been several attempts to propose definitions and algorithms for ranking queries on probabilistic data. However, these lack many intuitive properties of a top-k over deterministic data. We define several fundamental properties, including exact-k, containment, unique rank, value invariance, and stability, which are satisfied by ranking(More)
Ranking queries are essential tools to process large amounts of probabilistic data that encode exponentially many possible deterministic instances. In many applications where uncertainty and fuzzy information arise, data are collected from multiple sources in distributed, networked locations, e.g., distributed sensor fields with imprecise measurements,(More)
MapReduce is becoming the de facto framework for storing and processing massive data, due to its excellent scalability, reliability, and elasticity. In many MapReduce applications, obtaining a compact accurate summary of data is essential. Among various data summarization tools, histograms have proven to be particularly important and useful for summarizing(More)
Edit distance based string similarity join is a fundamental operator in string databases. Increasingly, many applications in data cleaning, data integration, and scientific computing have to deal with fuzzy information in string attributes. Despite the intensive efforts devoted in processing (deterministic) string joins and managing probabilistic data(More)
......Warehouse-scale computing is dominated by systems using commodity hardware to execute workloads, processing large amounts of data distributed among many disks. Several frameworks, such as MapReduce, have emerged in recent years to facilitate the management and development of big-data workloads. These systems rely on disks for most data accesses, but(More)
In distributed data management, a primary concern is monitoring the distributed data and generating an alarm when a user specified constraint is violated. A particular useful instance is the threshold based constraint, which is commonly known as the distributed threshold monitoring problem [4], [16], [19], [29]. This work extends this useful and fundamental(More)
Ranking temporal data has not been studied until recently, even though ranking is an important operator (being promoted as a firstclass citizen) in database systems. However, only the instant top-k queries on temporal data were studied in, where objects with the k highest scores at a query time instance t are to be retrieved. The instant top-k definition(More)
Kernel density estimates are important for a broad variety of applications. Their construction has been well-studied, but existing techniques are expensive on massive datasets and/or only provide heuristic approximations without theoretical guarantees. We propose randomized and deterministic algorithms with quality guarantees which are orders of magnitude(More)