Self-similarity in the web

  title={Self-similarity in the web},
  author={Steve Dill and Ravi Kumar and Kevin S. McCurley and Sridhar Rajagopalan and D. Sivakumar and Andrew Tomkins},
  journal={ACM Trans. Internet Techn.},
Algorithmic tools for searching and mining the Web are becoming increasingly sophisticated and vital. In this context, algorithms that use and exploit structural information about the Web perform better than generic methods in both efficiency and reliability.We present an extensive characterization of the graph structure of the Web, with a view to enabling high-performance applications that make use of this structure. In particular, we show that the Web emerges as the outcome of a number of… 

Figures and Tables from this paper

Structural Analysis of the Web

An extensive analysis of the web, purely, based on the graph analysis algorithms, reaffirm that the Web, is indeed a Fractal and each structurally isomorphic subgraph shows the same characteristics as the Web and follows the classical Bow-tie model.

The Web and Social Networks

This research includes graph-theoretic studies of connectivity, which have shown the Web to have strong similarities with social networks, and finds a fractal structure in a graph theoretic setting that adds further evidence to the Web's small-world social nature.

A study of stochastic models for the Web Graph

An extensive study of the statistical properties of several stochastic models for the Webgraph presented so far in literature is presented and a new Stochastic model motivated by the observation of the self-organized structure of the Web is proposed.

Stochastic analysis of web page ranking

This thesis presents a new methodology for analyzing the probabilistic behavior of the PageRank distribution and the dependence between various power law parameters of the Web, based on the techniques from the theory of regular variations and the extreme value theory.

Mining the inner structure of the Web graph

It is found that the scale-free properties permeate all the components of the bow-tie which exhibit the same macroscopic properties as the Web graph itself, however, close inspection reveals that their inner structure is quite distinct.

Link Structure of Hierarchical Information Networks

It is described how to construct data models of the Web that capture both the hierarchical nature of the web as well as some crucial features of the link graph, and how this interaction between hierarchical structure and link structure extends to other domains.

Modelling and simulation of the web graph: evaluating an exponential growth copying model

The behaviour of the Exponential Growth Copying (EGC) model is evaluated, which has been explicitly designed to model the WWW, and the effect of individual parameters on its effectiveness through simulation modelling is analysed.

Self-organization, Self-regulation, and Self-similarity on the Fractal Web

The authors begin by modelling the World Wide Web as an ecosystem, which reflects an intimate coupling of people, programs, and pages that influences one another to yield an amazing array of self-organization, self-regulation, and self-similarity.

Coarse-grained classification of web sites by their structural properties

This paper identifies and analyzes structural properties which reflect the functionality of a Web site and introduces a content-independent approach for the automated coarse-grained classification of Web sites.

On Web’s contact structure

It is shown that Web is still a scale-free network, with three main classes of nodes: very few huge nodes, the hubs, a significant number of intermediate nodes, an huge number of small nodes.



ParaSite: Mining Structural Information on the Web

Extracting Large-Scale Knowledge Bases from the Web

This paper develops novel algorithms for enumerating and organizing all web occurrences of certain subgraphs that are signatures of web phenomena such as tightly-focused topic communities, webrings, taxonomy trees, keiretsus, etc, and argues that these algorithms run efficiently in this model.

Silk from a sow's ear: extracting usable structures from the Web

This paper presents the exploration into techniques that utilize both the topology and textual similarity between items as well as usage data collected by servers and page meta-information lke title and size.

The Anatomy of a Large-Scale Hypertextual Web Search Engine

Graph structure in the Web

The Nature of Markets in the World Wide Web

Much has been said about the possibility that in the information age, ease of entry and global access will lead to market characteristics with few inefficiencies. While several arguments have been

Strong regularities in world wide web surfing

A model that assumes that users make a sequence of decisions to proceed to another page, continuing as long as the value of the current page exceeds some threshold, yields the probability distribution for the number of pages that a user visits within a given Web site.

Stochastic models for the Web graph

The results are two fold: it is shown that graphs generated using the proposed random graph models exhibit the statistics observed on the Web graph, and additionally, that natural graph models proposed earlier do not exhibit them.

WebQuery: Searching and Visualizing the Web Through Connectivity

Surfing as a real option

The options viewpoint is considered as a descriptive theory of information foraging by Internet users, and it is shown how it leads to a kind of “law of surfing” which has been verified experimentally in several large independent datasets.