Baharan Mirzasoleiman

Learn More
How can one summarize a massive data set "on the fly", i.e., without even having seen it in its entirety? In this paper, we address the problem of extracting representative elements from a large stream of data. I.e., we would like to select a subset of say k data points from the stream that are most representative according to some objective function. Many(More)
Many large-scale machine learning problems (such as clustering, non-parametric learning, kernel machines, etc.) require selecting, out of a massive data set, a manageable yet representative subset. Such problems can often be reduced to maximizing a submodular set function subject to cardinality constraints. Classical approaches require centralized access to(More)
Is it possible to maximize a monotone submodular function faster than the widely used lazy greedy algorithm (also known as accelerated greedy), both in theory and practice? In this paper, we develop the first linear-time algorithm for maximizing a general monotone submodular function subject to a cardinality constraint. We show that our randomized algorithm(More)
Complex networks serve as generic models for many biological systems that have been shown to share a number of common structural properties such as power-law degree distribution and small-worldness. Real-world networks are composed of building blocks called motifs that are indeed specific subgraphs of (usually) small number of nodes. Network motifs are(More)
Many technological networks might experience random and/or systematic failures in their components. More destructive situation can happen if the components have limited capacity, which the failure in one of them might lead to a cascade of failures in other components, and consequently break down the structure of the network. In this paper, the tolerance of(More)
Social networking has become a part of daily life for many individuals across the world. Widespread adoption of various strategies in such networks can be utilized by business corporations as a powerful means for advertising. In this study, we investigated viral marketing strategies in which buyers are influenced by other buyers who already own an item.(More)
How can one find a subset, ideally as small as possible, that well represents a massive dataset? I.e., its corresponding utility, measured according to a suitable utility function, should be comparable to that of the whole dataset. In this paper, we formalize this challenge as a submodular cover problem. Here, the utility is assumed to exhibit(More)
—There was a lot of interest in multicast communications within this decade as it is an essential part of many network applications, e.g. video-on-demand, etc. In this paper, we model flow rate allocation for application overlay as a utility based optimization problem constrained by capacity limitations of physical links and overlay constraints. The(More)
Can we summarize multi-category data based on user preferences in a scalable manner? Many utility functions used for data summarization satisfy submodularity, a natural diminishing returns property. We cast personalized data summa-rization as an instance of a general submod-ular maximization problem subject to multiple constraints. We develop the first(More)
In this paper, we introduce the public-private framework of data summarization motivated by privacy concerns in personalized recommender systems and online social services. Such systems have usually access to massive data generated by a large pool of users. A major fraction of the data is public and is visible to (and can be used for) all users. However,(More)