Learn More
Massive networks arising in numerous application areas poses significant challenges for network analysts as these networks grow to billions of nodes and are prohibitively large to fit in the main memory. Finding the number of triangles in a network is an important problem in the analysis of complex networks. Several interesting graph mining applications(More)
We present a uniform approach to design efficient distributed approximation algorithms for various network optimization problems. Our approach is randomized and based on a probabilistic tree embedding due to Fakcharoenphol, Rao, and Talwar (FRT embedding). We show how to efficiently compute an (implicit) FRT embedding in a decentralized manner and how to(More)
Classification of spatial data has become important due to the fact that there are huge volumes of spatial data now available holding a wealth of valuable information. In this paper we consider the classification of spatial data streams, where the training dataset changes often. New training data arrive continuously and are added to the training set. For(More)
The Minimum Spanning Tree (MST) problem is an important and commonly occurring primitive in the design and operation of data and communication networks. While there are distributed algorithms for the MST problem, these algorithms require relatively large number of messages and time, and are fairly involved, require synchronization and a lot of book keeping;(More)
We describe "first principles" based methods for developing synthetic urban and national scale social contact networks. Unlike simple random graph techniques, these methods use real world data sources and combine them with behavioral and social theories to synthesize networks. We develop a synthetic population for the United States modeling every individual(More)
We give a distributed algorithm that constructs a O(log n)-approximate minimum spanning tree (MST) in arbitrary networks. Our algorithm runs in time˜O(D(G) + L(G, w)) where L(G, w) is a parameter called the local shortest path diameter and D(G) is the (unweighted) diameter of the graph. Our algorithm is existentially optimal (up to polylogarithmic factors),(More)
—Relational subgraph analysis, e.g. finding labeled subgraphs in a network, which are isomorphic to a template, is a key problem in many graph related applications. It is computationally challenging for large networks and complex templates. In this paper, we develop SAHAD, an algorithm for relational subgraph analysis using Hadoop, in which the subgraph is(More)
Continuous monitoring of a network domain poses several challenges. First, routers of a network domain need to be polled periodically to collect statistics about delay, loss, and bandwidth. Second, this huge amount of data has to be mined to obtain useful monitoring information. This increases the overhead for high speed core routers, and restricts the(More)