Large scale properties of the Webgraph
@article{Donato2004LargeSP, title={Large scale properties of the Webgraph}, author={Debora Donato and Luigi Laura and Stefano Leonardi and Stefano Millozzi}, journal={The European Physical Journal B}, year={2004}, volume={38}, pages={239-243} }
Abstract.In this paper we present an experimental study of the properties of web graphs. We study a large crawl from 2001 of 200M pages and about 1.4 billion edges made available by the WebBase project at Stanford [17]. We report our experimental findings on the topological properties of such graphs, such as the number of bipartite cores and the distribution of degree, PageRank values and strongly connected components.
Figures and Tables from this paper
105 Citations
The Web as a graph: How far we are
- Computer Science, MathematicsTOIT
- 2007
A large crawl from 2001 of 200M pages and about 1.4 billion edges, made available by the WebBase project at Stanford is studied, as well as several synthetic ones generated according to various models proposed recently, to investigate several topological properties of webgraphs.
PageRank of integers
- Mathematics, Computer ScienceArXiv
- 2012
The PageRank vector of this matrix is computed numerically and it is shown that its probability is inversely proportional to the PageRank index thus being similar to the Zipf law and the dependence established for the World Wide Web.
Stochastic analysis of web page ranking
- Mathematics, Computer Science
- 2009
This thesis presents a new methodology for analyzing the probabilistic behavior of the PageRank distribution and the dependence between various power law parameters of the Web, based on the techniques from the theory of regular variations and the extreme value theory.
Simulating the Webgraph: a comparative analysis of models
- Computer Science
- 2004
This work simulated several of these models and compared them against a 300-million-node sample of the Webgraph provided by the Stanford WebBase project, finding that the more random the model, the better the graph.
Decoding the structure of the WWW: A comparative analysis of Web crawls
- Computer ScienceTWEB
- 2007
A detailed statistical analysis of the topological properties of four different WWW graphs obtained with different crawlers finds that, despite the very large size of the samples, the statistical measures characterizing these graphs differ quantitatively, and in some cases qualitatively, depending on the domain analyzed and the crawl used for gathering the data.
Ranking web sites with real user traffic
- Computer ScienceWSDM '08
- 2008
The traffic-weighted Web host graph obtained from a large sample of real Web users is analyzed, finding that while search is directly involved in a surprisingly small fraction of user clicks, it leads to a much larger fraction of all sites visited.
Modeling the Webgraph Evolution with Graph Grammars
- Mathematics, Computer Science
- 2006
The suitability of graph grammars to generate and analyze the webgraph is investigated and the idea is to use properties that are observed in webgraphs and create rules that preserve these properties.
Network growth by copying.
- Computer SciencePhysical review. E, Statistical, nonlinear, and soft matter physics
- 2005
A growing network model in which a new node attaches to a randomly selected node, as well as to all ancestors of the target node, produces a sparse, ultrasmall network where the average node degree grows logarithmically with network size while the network diameter equals 2.
Decoding the structure of the WWW: facts versus sampling biases
- Computer ScienceArXiv
- 2005
A detailed statistical analysis of the topological properties of four different WWW graphs obtained with different crawlers finds that, despite the very large size of the samples, the statistical characterizing these graphs differ quantitatively, and in some cases qualitatively, depending on the domain analyzed and the crawl used for gathering the data.
Some Preliminary Results from a Link-crawl of the European Union Research Area Web
- Computer Science
- 2008
A constrained Web link crawler has been used to obtain a broad multi-national sample of the European Union Research Area Web. This preliminary sample confirms that the distribution of many Web…
References
SHOWING 1-10 OF 29 REFERENCES
Algorithms and Experiments for the Webgraph
- Computer Science, MathematicsESA
- 2003
An experimental study of the properties of web graphs made available by the WebBase project at Stanford, and synthetic graphs obtained by the large scale simulation of stochastic graph models for the Webgraph.
A Multi-Layer Model for the Web Graph
- Computer ScienceWebDyn@WWW
- 2002
A new model is presented that describes the WebGraph as an ensemble of dierent regions generated by independent stochastic processes that are simulated and compared on several relevant measures such as degree and clique distribution.
Using PageRank to Characterize Web Structure
- Computer Science, MathematicsInternet Math.
- 2006
It is suggested that PageRank values on the web follow a power law, and generative models for the web graph are developed that explain this observation and moreover remain faithful to previously studied degree distributions.
Internet: Diameter of the World-Wide Web
- Computer ScienceNature
- 1999
The World-Wide Web becomes a large directed graph whose vertices are documents and whose edges are links that point from one document to another, which determines the web's connectivity and consequently how effectively the authors can locate information on it.
Trawling the Web for Emerging Cyber-Communities
- Computer ScienceComput. Networks
- 1999
Dynamical and correlation properties of the internet.
- Computer SciencePhysical review letters
- 2001
It is found that the Internet is characterized by non-trivial correlations among nodes and different dynamical regimes, and the importance of node hierarchy and aging in the Internet structure and growth is pointed out.
Authoritative sources in a hyperlinked environment
- Computer ScienceJACM
- 1999
This work proposes and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of “hub pages” that join them together in the link structure, and has connections to the eigenvectors of certain matrices associated with the link graph.
A General Model of Undirected Web Graphs
- Computer Science, MathematicsESA
- 2001
A general model of a random graph process whose degree sequence obeys a power law, which has recently been observed in graphs associated with the world wide web, is described.
Efficient Computation of PageRank
- Computer Science, Mathematics
- 1999
It is shown that PageRank can be computed for very large subgraphs of the web (up to hundreds of millions of nodes) on machines with limited main memory.