Large scale properties of the Webgraph

@article{Donato2004LargeSP,
  title={Large scale properties of the Webgraph},
  author={Debora Donato and Luigi Laura and Stefano Leonardi and Stefano Millozzi},
  journal={The European Physical Journal B},
  year={2004},
  volume={38},
  pages={239-243}
}
Abstract.In this paper we present an experimental study of the properties of web graphs. We study a large crawl from 2001 of 200M pages and about 1.4 billion edges made available by the WebBase project at Stanford [17]. We report our experimental findings on the topological properties of such graphs, such as the number of bipartite cores and the distribution of degree, PageRank values and strongly connected components. 

Figures and Tables from this paper

The Web as a graph: How far we are
TLDR
A large crawl from 2001 of 200M pages and about 1.4 billion edges, made available by the WebBase project at Stanford is studied, as well as several synthetic ones generated according to various models proposed recently, to investigate several topological properties of webgraphs.
PageRank of integers
TLDR
The PageRank vector of this matrix is computed numerically and it is shown that its probability is inversely proportional to the PageRank index thus being similar to the Zipf law and the dependence established for the World Wide Web.
Stochastic analysis of web page ranking
TLDR
This thesis presents a new methodology for analyzing the probabilistic behavior of the PageRank distribution and the dependence between various power law parameters of the Web, based on the techniques from the theory of regular variations and the extreme value theory.
Simulating the Webgraph: a comparative analysis of models
TLDR
This work simulated several of these models and compared them against a 300-million-node sample of the Webgraph provided by the Stanford WebBase project, finding that the more random the model, the better the graph.
Decoding the structure of the WWW: A comparative analysis of Web crawls
TLDR
A detailed statistical analysis of the topological properties of four different WWW graphs obtained with different crawlers finds that, despite the very large size of the samples, the statistical measures characterizing these graphs differ quantitatively, and in some cases qualitatively, depending on the domain analyzed and the crawl used for gathering the data.
Ranking web sites with real user traffic
TLDR
The traffic-weighted Web host graph obtained from a large sample of real Web users is analyzed, finding that while search is directly involved in a surprisingly small fraction of user clicks, it leads to a much larger fraction of all sites visited.
Modeling the Webgraph Evolution with Graph Grammars
TLDR
The suitability of graph grammars to generate and analyze the webgraph is investigated and the idea is to use properties that are observed in webgraphs and create rules that preserve these properties.
Network growth by copying.
  • P. Krapivsky, S. Redner
  • Computer Science
    Physical review. E, Statistical, nonlinear, and soft matter physics
  • 2005
TLDR
A growing network model in which a new node attaches to a randomly selected node, as well as to all ancestors of the target node, produces a sparse, ultrasmall network where the average node degree grows logarithmically with network size while the network diameter equals 2.
Decoding the structure of the WWW: facts versus sampling biases
TLDR
A detailed statistical analysis of the topological properties of four different WWW graphs obtained with different crawlers finds that, despite the very large size of the samples, the statistical characterizing these graphs differ quantitatively, and in some cases qualitatively, depending on the domain analyzed and the crawl used for gathering the data.
Some Preliminary Results from a Link-crawl of the European Union Research Area Web
A constrained Web link crawler has been used to obtain a broad multi-national sample of the European Union Research Area Web. This preliminary sample confirms that the distribution of many Web
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 29 REFERENCES
Algorithms and Experiments for the Webgraph
TLDR
An experimental study of the properties of web graphs made available by the WebBase project at Stanford, and synthetic graphs obtained by the large scale simulation of stochastic graph models for the Webgraph.
Graph structure in the Web
A Multi-Layer Model for the Web Graph
TLDR
A new model is presented that describes the WebGraph as an ensemble of dierent regions generated by independent stochastic processes that are simulated and compared on several relevant measures such as degree and clique distribution.
Using PageRank to Characterize Web Structure
TLDR
It is suggested that PageRank values on the web follow a power law, and generative models for the web graph are developed that explain this observation and moreover remain faithful to previously studied degree distributions.
Internet: Diameter of the World-Wide Web
TLDR
The World-Wide Web becomes a large directed graph whose vertices are documents and whose edges are links that point from one document to another, which determines the web's connectivity and consequently how effectively the authors can locate information on it.
Trawling the Web for Emerging Cyber-Communities
Dynamical and correlation properties of the internet.
TLDR
It is found that the Internet is characterized by non-trivial correlations among nodes and different dynamical regimes, and the importance of node hierarchy and aging in the Internet structure and growth is pointed out.
Authoritative sources in a hyperlinked environment
TLDR
This work proposes and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of “hub pages” that join them together in the link structure, and has connections to the eigenvectors of certain matrices associated with the link graph.
A General Model of Undirected Web Graphs
TLDR
A general model of a random graph process whose degree sequence obeys a power law, which has recently been observed in graphs associated with the world wide web, is described.
Efficient Computation of PageRank
TLDR
It is shown that PageRank can be computed for very large subgraphs of the web (up to hundreds of millions of nodes) on machines with limited main memory.
...
1
2
3
...