The Graph Structure in the Web - Analyzed on Different Aggregation Levels

@article{Meusel2015TheGS,
  title={The Graph Structure in the Web - Analyzed on Different Aggregation Levels},
  author={Robert Meusel and Sebastiano Vigna and Oliver Lehmberg and Christian Bizer},
  journal={J. Web Sci.},
  year={2015},
  volume={1},
  pages={33-47}
}
Knowledge about the general graph structure of theWorldWideWeb is important for understanding the social mechanisms that govern its growth, for designing ranking methods, for devising better crawling algorithms, and for creating accurate models of its structure. In this paper, we analyze a large web graph. The graph was extracted from a large publicly accessible web crawl that was gathered by the Common Crawl Foundation in 2012. The graph covers over 3:5 billion web pages and 128:7 billion… Expand
On Web’s contact structure
TLDR
It is shown that Web is still a scale-free network, with three main classes of nodes: very few huge nodes, the hubs, a significant number of intermediate nodes, an huge number of small nodes. Expand
Analysis of the Web Graph Aggregated by Host and Pay-Level Domain
  • A. Funel
  • Computer Science
  • COMPLEX NETWORKS
  • 2018
TLDR
While there is no evidence of power law tails on host level, they emerge on PLD aggregation for indegree, SCC and WCC size distributions, and distance-related features are analyzed by studying the cumulative distributions of the shortest path lengths, and give an estimation of the diameters of the graphs. Expand
Optimal Representation of Large-Scale Graph Data Based on K2-Tree
TLDR
The basic idea of the approach is trying to compress a large number of zeros as a single zero, which not only reduces the space for representing a Web graph, but also reduces the time consumption for operations such as retrieving neighbors of any nodes on the graph. Expand
Extracting Network Structure for International and Malaysia Website via Random Walk
TLDR
This paper constructs networks by using random walk process that traverses the web at two popular websites, namely google.com and mudah.my, and analyses the network at the domain level to identify some top-level domains appearing in both networks in order to understand the connectivity of the web in different regions. Expand
On the Graph Structure of the Web of Data
TLDR
Results show that the Web of Data also complies with the theory of the bow-tie, and the biggest one is Open Data Euskadi but the one with more connections to other datasets is Dbpedia. Expand
Exploring the Topological Properties of the Tor Dark Web
TLDR
This paper analyzes the internal structure of the Tor dark web graph and examines the presence of bow-tie structure as found in the World Wide Web and finds that most of the nodes of the graph have in-degree and out-degree less than ten. Expand
Local bow-tie structure of the web
TLDR
It is found that there are striking difference between the WWW and other social and artificial networks including a million firms’ nationwide supply chain network in Japan and thousands of symbols’ dependency in the programming language of Emacs LISP, in which a global bow-tie exits. Expand
Influential analysis of web structure using graph based approach
TLDR
A graph-based method is proposed for analyzing how one web page can influence access of other web pages and this algorithm is used to discover the information flow patterns and their influences on the network structure. Expand
Malware distributions and graph structure of the Web
TLDR
This is the first large-scale study describing the differences in global properties between malicious and clean parts of the Web and can help antivirus vendors in devising approaches to improve their detection algorithms. Expand
Malware and graph structure of the Web
Knowledge about the graph structure of the Web is important for understanding this complex socio-technical system and for devising proper policies supporting its future development. Knowledge aboutExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 29 REFERENCES
Graph structure in the web --- revisited: a trick of the heavy tail
TLDR
A large, publicly accessible crawl of the web that was gathered by the Common Crawl Foundation in 2012 and that contains over 3.5 billion web pages and 128.7 billion links is described and analysed, confirming the existence of a giant strongly connected component and providing for the first time accurate measurement of distance-based features, using recently introduced algorithms that scale to the size of the crawl. Expand
Decoding the structure of the WWW: A comparative analysis of Web crawls
TLDR
A detailed statistical analysis of the topological properties of four different WWW graphs obtained with different crawlers finds that, despite the very large size of the samples, the statistical measures characterizing these graphs differ quantitatively, and in some cases qualitatively, depending on the domain analyzed and the crawl used for gathering the data. Expand
Graph structure in the web: aggregated by pay-level domain
TLDR
This paper analyzes an aggregated version of a recent web graph and presents basic statistics about the PLD graph, such as degree distributions, top-ranked PLDs, distances and diameter, and analyzes whether the bow-tie structure introduced by Broder et al. can be identified in the graph and reveals a backbone of highly interlinked websites within the graph. Expand
Mining the inner structure of the Web graph
TLDR
It is found that the scale-free properties permeate all the components of the bow-tie which exhibit the same macroscopic properties as the Web graph itself, however, close inspection reveals that their inner structure is quite distinct. Expand
Using PageRank to Characterize Web Structure
TLDR
This work studies the distribution of PageRank values (used in the Google search engine) on the Web, and develops detailed models for the Web graph that explain this observation, and remain faithful to previously studied degree distributions. Expand
A teapot graph and its hierarchical structure of the chinese web
TLDR
Viewing the Web as a hierarchy of three levels, namely page level, host level, and domain level, results suggest that the Chinese Web appears more like a teapot at page level than the classic bow tie or daisy shape. Expand
Deployment of RDFa, Microdata, and Microformats on the Web - A Quantitative Analysis
TLDR
This study is based on a large public Web crawl dating from early 2012 and consisting of 3 billion HTML pages which originate from over 40 million websites, and reveals the deployment of the different markup standards, the main topical areas of the published data as well as the different vocabularies that are used within each topical area to represent data. Expand
In-Core Computation of Geometric Centralities with HyperBall: A Hundred Billion Nodes and Beyond
  • P. Boldi, S. Vigna
  • Computer Science, Physics
  • 2013 IEEE 13th International Conference on Data Mining Workshops
  • 2013
TLDR
This paper exploitation of HyperLogLog counters reduces exponentially the memory footprint, paving the way for in-core processing of networks with a hundred billion nodes using "just" 2TiB of RAM. Expand
On the bias of traceroute sampling: or, power-law degree distributions in regular graphs
TLDR
This work puts the observations of Lakhina et al. on a rigorous footing, and extends them to nearly arbitrary degree distributions, and shows how traceroute sampling finds power-law degree distributions in both δ-regular and Poisson-distributed random graphs. Expand
Web Structure in 2005
TLDR
The results indicate that the size of the "CORE," the central component of the bow tie structure, has increased in recent years, especially in the Chinese and Japanese web. Expand
...
1
2
3
...