Corpus ID: 6725985

Task-driven sampling of attributed networks

  title={Task-driven sampling of attributed networks},
  author={Suhansanu Kumar and H. Sundaram},
This paper introduces new techniques for sampling attributed networks to support standard Data Mining tasks. The problem is important for two reasons. First, it is commonplace to perform data mining tasks such as clustering and classification of network attributes (attributes of the nodes, including social media posts). Furthermore, the extraordinarily large size of real-world networks necessitates that we work with a smaller graph sample. Second, while random sampling will provide an unbiased… Expand
Empirical comparison of network sampling: How to choose the most appropriate method?
The techniques with subgraph induction improve the performance of techniques without induction and create denser sample networks with larger average degree, and the breadth-first exploration sampling proves as the best performing technique. Expand
Report of 2017 NSF Workshop on Multimedia Challenges, Opportunities and Research Roadmaps
A summary was produced after the workshop to describe the main findings, including the state of the art, challenges, and research roadmaps planned for the next 5, 10, and 15 years in the identified area. Expand


On Sampling Type Distribution from Heterogeneous Social Networks
This paper formally addresses the issue of whether a sampling method can preserve the node and link type distribution of the heterogeneous social networks and applies five algorithms to the real Twitter data sets to evaluate their performance. Expand
Matching patterns in networks with multi-dimensional attributes: a machine learning approach
  • K. Pelechrinis
  • Mathematics, Computer Science
  • Social Network Analysis and Mining
  • 2014
The findings indicate that while the baseline of assortativity vector performs satisfactory when the variance of the elements of the vector attribute across the network population is kept low, it provides biased results as this variance increases, and this approach appears to be robust in such scenarios. Expand
Sampling from large graphs
The best performing methods are the ones based on random-walks and "forest fire"; they match very accurately both static as well as evolutionary graph patterns, with sample sizes down to about 15% of the original graph. Expand
On the bias of BFS (Breadth First Search)
This paper quantify the degree bias of BFS sampling, and calculates the node degree distribution expected to be observed by BFS as a function of the fraction of covered nodes, in a random graph RG(pk) with a given degree distribution pk. Expand
On Sampling Nodes in a Network
This paper considers the problem of sampling nodes from a large graph according to a prescribed distribution by using random walk as the basic primitive, and studies the query complexity of three algorithms and shows a near-tight bound expressed in terms of the parameters of the graph. Expand
Walking in Facebook: A Case Study of Unbiased Sampling of OSNs
The goal in this paper is to obtain a representative (unbiased) sample of Facebook users by crawling its social graph using several candidate techniques, and introduces online formal convergence diagnostics to assess sample quality during the data collection process. Expand
Semantically sampling in heterogeneous social networks
This study presents a property, Relational Profile, to account for conditional dependency of node and relation type semantics in a network, and a sampling method to preserve the property, and shows the proposed sampling method better preservesrelational Profile. Expand
Metropolis Algorithms for Representative Subgraph Sampling
Novel Metropolis algorithms for sampling a 'representative' small subgraph from the original large graph are presented, with 'Representative' describing the requirement that the sample shall preserve crucial graph properties of the original graph. Expand
Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities.
The basic ideas behind the previous benchmark are extended to generate directed and weighted networks with built-in community structure, and the possibility that nodes belong to more communities is considered, a feature occurring in real systems, such as social networks. Expand
Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters
This paper employs approximation algorithms for the graph-partitioning problem to characterize as a function of size the statistical and structural properties of partitions of graphs that could plausibly be interpreted as communities, and defines the network community profile plot, which characterizes the "best" possible community—according to the conductance measure—over a wide range of size scales. Expand