Learn More
Ranking is a fundamental operation in data analysis and decision support and plays an even more crucial role if the dataset being explored exhibits uncertainty. This has led to much work in understanding how to rank the tuples in a probabilistic dataset in recent years. In this article, we present a unified approach to ranking and top-k query processing in(More)
Given an undirected graph G = (V,E), the density of a subgraph on vertex set S is defined as d(S) = |E(S)| |S| , where E(S) is the set of edges in the subgraph induced by nodes in S. Finding subgraphs of maximum density is a very well studied problem. One can also generalize this notion to directed graphs. For a directed graph one notion of density given by(More)
We generalize the graph streaming model to hypergraphs. In this streaming model, hyperedges are arriving online and any computation has to be done on-the-fly using a small amount of space. Each hyperedge can be viewed as a set of elements (nodes), so we refer to our proposed model as the “set-streaming” model of computation. We consider the problem of(More)
The Lovász Local Lemma (LLL) is a powerful tool that gives sufficient conditions for avoiding all of a given set of “bad” events, with positive probability. A series of results have provided algorithms to efficiently construct structures whose existence is non-constructively guaranteed by the LLL, culminating in the recent breakthrough of(More)
In this paper, we focus on finding complex annotation patterns representing novel and interesting hypotheses from gene annotation data. We define a generalization of the densest subgraph problem by adding an additional distance restriction (defined by a separate metric) to the nodes of the subgraph. We show that while this generalization makes the problem(More)
We are often thrilled by the abundance of information surrounding us and wish to integrate data from as many sources as possible. However, understanding, analyzing, and using these data are often hard. Too much data can introduce a huge integration cost, such as expenses for purchasing data and resources for integration and cleaning. Furthermore, including(More)
In the classical <i>k</i>-median problem, we are given a metric space and would like to open <i>k</i> centers so as to minimize the sum (over all the vertices) of the distance of each vertex to its nearest open center. In this paper, we consider the following generalization of the problem: instead of opening at most <i>k</i> centers, what if each center(More)
In this paper we introduce a dynamic algorithm for clustering undirected graphs, whose edge property is continuously changing. The algorithm can maintain high-quality clusters efficiently in presence of insertion and deletion (update) of edges. The algorithm is motivated by the minimum-cut tree based partitioning algorithm presented in G. W. Flake et al.(More)
We consider the problem of efficiently scheduling jobs on data centers to minimize the cost of renting machines from “the cloud.” In the most basic cloud service model, cloud providers offer computers on demand from large pools installed in data centers. Clients pay for use at an hourly rate. In order to minimize cost, each client needs to decide on the(More)