Network Sampling: From Static to Streaming Graphs

  title={Network Sampling: From Static to Streaming Graphs},
  author={Nesreen Ahmed and Jennifer Neville and Ramana Rao Kompella},
Network sampling is integral to the analysis of social, information, and biological networks. Since many real-world networks are massive in size, continuously evolving, and/or distributed in nature, the network structure is often sampled in order to facilitate study. For these reasons, a more thorough and complete understanding of network sampling is critical to support the field of network science. In this paper, we outline a framework for the general problem of network sampling by… 

Adaptive Shrinkage Estimation for Streaming Graphs

This work proposes a novel adaptive, single-pass sampling framework and unbiased estimators for higher-order network analysis of large streaming networks, and introduces a novel James-Stein shrinkage estimator to reduce the estimation error.

Streaming Graph Sampling with Size Restrictions

It is found that both RIES and WES produce subgraphs that are more structurally similar to the original graph than are the sub graphs produced by streaming RE, which is an improvement in the available streaming graph analysis toolkit.

Noise Corrected Sampling of Online Social Networks

  • M. Coscia
  • Computer Science
    ACM Trans. Knowl. Discov. Data
  • 2021
Overall, the noise-corrected network sampling performs well: it has the best rank average among the tested methods across a wide range of applications.

Social Network Sampling

This chapter introduces four sampling algorithms including DLAS, EDLAS, ICLA-NS and FLAS which utilize learning automata for producing representative subgraphs from online social networks.

ComPAS: Community Preserving Sampling for Streaming Graphs

It is argued that for any sampling method it is impossible to produce an universal representative sample which can preserve all the properties of the original graph; rather sampling should be application specific (such as preserving hubs - needed for information diffusion).

Network Shrinkage Estimation

This work proposes a novel adaptive, single-pass sampling framework and unbiased estimators for higher-order network analysis of large streaming networks, and introduces a novel James-Stein-type shrinkage estimator that reduces estimation error.

Graph sample and hold: a framework for big-graph analytics

A generic stream sampling framework for big-graph analytics, called Graph Sample and Hold (gSH), which samples from massive graphs sequentially in a single pass, one edge at a time, while maintaining a small state in memory is proposed.

Spectral Algorithms for Streaming Graph Analysis: A Survey

The state-of-the-art progress in streaming graph analysis with spectral algorithms is touched upon, mainly covering the latest developments in the areas like sampling, sparsification, singular value decomposition, counting problems related to local structures, analysis of global structures, partitioning, labeling, mesh processing, discovery of patterns, anomalous hotspot discovery, detection of communities, etc.

Clustering-Structure Representative Sampling from Graph Streams

This work proposes a new sampling algorithm that dynamically maintains a representative sample and is capable of retaining clustering structure in graph streams at any time and outperforms current online sampling algorithms.



Space-efficient sampling from social activity streams

This work proposes a streaming graph sampling algorithm that dynamically maintains a representative sample in a reservoir based setting and evaluates the efficacy of the proposed methods empirically using several real-world data sets.

Network Sampling via Edge-based Node Selection with Graph Induction

A novel sampling algorithm called TIES is addressed that aims to offset this bias by using edge-based node selection, which favors selection of high-degree nodes, and uses a graph induction step to select additional edges between sampled nodes to restore connectivity and bring the structure closer to that of the original graph.

Reconsidering the Foundations of Network Sampling

This paper reconsider the foundations of network sampling and attempt to formalize the goals, and process of, sampling, in order to frame future development and analysis of sampling algorithms.

Sampling Online Social Networks

This paper introduces sampling-based algorithms to efficiently explore a user's social network respecting its structure and to quickly approximate quantities of interest and shows that these algorithms can be utilized to rank items in the neighborhood of a user, assuming that information for each user in the network is available.

Time-based sampling of social network activity graphs

This paper proposes a novel sampling algorithm called Streaming Time Node Sampling (STNS) that exploits temporal clustering often found in real social networks and significantly out-performs state-of-the-art sampling mechanisms such as node sampling and Forest Fire sampling, across both averages and distributions of several graph properties.

Outlier detection in graph streams

First results on the problem of structural outlier detection in massive network streams are provided, using a structural connectivity model in order to define outliers in graph streams and designing a reservoir sampling method to maintain structural summaries of the underlying network.

Sampling from large graphs

The best performing methods are the ones based on random-walks and "forest fire"; they match very accurately both static as well as evolutionary graph patterns, with sample sizes down to about 15% of the original graph.

Statistical properties of sampled networks by random walks.

The sampling method is applied to various real networks such as collaboration of movie actor, Worldwide Web, and peer-to-peer networks and all topological properties of the sampled networks are essentially the same as those of the original real networks.

GUISE: Uniform Sampling of Graphlets for Large Graph Analysis

This paper proposes GUISE, which uses a Markov Chain Monte Carlo (MCMC) sampling method for constructing the approximate GFD of a large network, and shows that GUISE obtains the GFD within few minutes, whereas the exhaustive counting based approach takes several days.

Subnets of scale-free networks are not scale-free: sampling properties of networks.

  • M. StumpfC. WiufR. May
  • Mathematics
    Proceedings of the National Academy of Sciences of the United States of America
  • 2005
The sampling properties of a network's degree distribution under the most parsimonious sampling scheme is discussed and it is shown that this condition is indeed satisfied for some important classes of networks, notably classical random graphs and exponential random graphs.