On the bias of traceroute sampling: or, power-law degree distributions in regular graphs

@article{Achlioptas2005OnTB,
  title={On the bias of traceroute sampling: or, power-law degree distributions in regular graphs},
  author={Dimitris Achlioptas and Aaron Clauset and David Kempe and Cristopher Moore},
  journal={J. ACM},
  year={2005},
  volume={56},
  pages={21:1-21:28}
}
Understanding the structure of the Internet graph is a crucial step for building accurate network models and designing efficient algorithms for Internet applications. Yet, obtaining its graph structure is a surprisingly difficult task, as edges cannot be explicitly queried. Instead, empirical studies rely on traceroutes to build what are essentially single-source, all-destinations, shortest-path trees. These trees only sample a fraction of the network's edges, and a recent paper by Lakhina et… 
Exploring networks with traceroute-like probes: theory and simulations
TLDR
An analytical approximation for the probability of edge and vertex detection is derived that exploits the role of the number of sources and targets and allows us to relate the global topological properties of the underlying network with the statistical accuracy of the sampled graph.
Deciding on the type of the degree distribution of a graph from traceroute-like measurements
TLDR
This work design procedures which estimate the degree distribution of a graph from a BFS of it, and shows experimentally that this approach succeeds in making the difference between Poisson and power-law degree distributions.
Sampling of Networks with Traceroute-Like Probes
A large part of the recent development of the interest in complex networks has been triggered by the observation of particular characteristics of real world networks, such as the small-world
Network Inference from TraceRoute Measurements: Internet Topology 'Species'
TLDR
The observation that the inference of many of the most basic topo logical quantities – including network size and degree characteristics – from traceroute measures is in fact a version of the so-called ‘species problem’ in statistics has important implications, since species problems are often quite challenging.
Sampling networks by the union of m shortest path trees
TLDR
This paper investigates the sampling method on a wide class of real-world complex networks as well as on the weighted Erdos-Renyi random graphs, and illustrates that in order to obtain an increasingly accurate view of a given network, a higher than linear detection/measuring effort is needed.
What is the real size of a sampled network? The case of the Internet.
TLDR
It is argued that inference of some of the standard topological quantities is, in fact, a version of the so-called "species" problem in statistics, which is important in categorizing the problem and providing some indication of its inherent difficulties.
Population size estimation and Internet link structure
Traceroute sampling is a common approach for exploring the autonomous system (AS) graph of the Internet. It provides samples of links between autonomous systems, but these links are not drawn
A new power law in topology discovery based on shortest-path
TLDR
A new feature of traceroute sampling is found and an application is given to predict the numbers of nodes and links detected in one probe, the first time to discover this characteristic.
Network Sampling via Edge-based Node Selection with Graph Induction
TLDR
A novel sampling algorithm called TIES is addressed that aims to offset this bias by using edge-based node selection, which favors selection of high-degree nodes, and uses a graph induction step to select additional edges between sampled nodes to restore connectivity and bring the structure closer to that of the original graph.
Sampling Content Distributed Over Graphs
TLDR
The experimental results show how one can obtain content properties by sampling only a small fraction of vertices in the graph, and propose two efficient estimators: special copy estimator (SCE) and weighted copy estimators (WCE) to measure content characteristics using available information in sampled contents.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 56 REFERENCES
Sampling biases in IP topology measurements
TLDR
It is shown that when graphs are sampled using traceroute-like methods, the resulting degree distribution can differ sharply from that of the underlying graph, and why this effect arises is explored.
Accuracy and scaling phenomena in Internet mapping.
TLDR
It is found that in order to accurately estimate alpha, one must use a number of sources which grows linearly in the mean degree of the underlying graph, and comment on the accuracy of the published values of alpha for the Internet.
Exploring networks with traceroute-like probes: theory and simulations
TLDR
An analytical approximation for the probability of edge and vertex detection is derived that exploits the role of the number of sources and targets and allows us to relate the global topological properties of the underlying network with the statistical accuracy of the sampled graph.
Bias Reduction in Traceroute Sampling - Towards a More Accurate Map of the Internet
TLDR
A new estimator for the degree of a node in a traceroute-sampled graph is developed and validated theoretically in Erdos-Renyi graphs and, through computer experiments, for a wider range of graphs; and it is applied to produce a new picture of the degree distribution of the autonomous system graph.
Issues with inferring Internet topological attributes
TLDR
This study uses multiple views of the same data to demonstrate that some topological attributes, such as the average path length, are relatively consistent across a variety of data sources, and illustrates how using the same methodology to model other attributes can produce substantially misleading results.
On the marginal utility of network topology measurements
TLDR
This paper characterize the observable topology in terms of nodes, links, node degree distribution, and distribution of end-to-end flows using statistical and information-theoretic techniques and shows that the utility of adding destinations is constant for interfaces, node, links and node degree indicating that it is more important to add destinations than sources.
On power-law relationships of the Internet topology
TLDR
These power-laws hold for three snapshots of the Internet, between November 1997 and December 1998, despite a 45% growth of its size during that period, and can be used to generate and select realistic topologies for simulation purposes.
Distance distribution in random graphs and application to network exploration.
TLDR
This work introduces a different way of computing the distribution of distances between nodes that leads to estimates that coincide remarkably well with numerical simulations and allows us to characterize the phase transitions appearing when the connectivity probability varies.
The origin of power laws in Internet topologies revisited
TLDR
Re-examine the BGP (border gateway protocol) measurements that form the basis for the results reported by Faloutsos et al. and find that while the vertex degree distributions resulting from the extended maps are heavy-tailed, they deviate significantly from a strict power law.
The Web as a Graph: Measurements, Models, and Methods
TLDR
This paper describes two algorithms that operate on the Web graph, addressing problems from Web search and automatic community discovery, and proposes a new family of random graph models that point to a rich new sub-field of the study of random graphs, and raises questions about the analysis of graph algorithms on the Internet.
...
1
2
3
4
5
...