Self-similarity in the web

@article{Dill2002SelfsimilarityIT,
  title={Self-similarity in the web},
  author={Steve Dill and Ravi Kumar and K. McCurley and S. Rajagopalan and D. Sivakumar and A. Tomkins},
  journal={ACM Trans. Internet Techn.},
  year={2002},
  volume={2},
  pages={205-223}
}
Algorithmic tools for searching and mining the Web are becoming increasingly sophisticated and vital. In this context, algorithms that use and exploit structural information about the Web perform better than generic methods in both efficiency and reliability.We present an extensive characterization of the graph structure of the Web, with a view to enabling high-performance applications that make use of this structure. In particular, we show that the Web emerges as the outcome of a number of… Expand
Structural Analysis of the Web
World Wide Web has evolved exponentially since its inception. Today, it has become important for the algorithms of the web applications like searching, webcrawling, community discovery to exploit theExpand
The Web and Social Networks
TLDR
This research includes graph-theoretic studies of connectivity, which have shown the Web to have strong similarities with social networks, and finds a fractal structure in a graph theoretic setting that adds further evidence to the Web's small-world social nature. Expand
A study of stochastic models for the Web Graph
TLDR
An extensive study of the statistical properties of several stochastic models for the Webgraph presented so far in literature is presented and a new Stochastic model motivated by the observation of the self-organized structure of the Web is proposed. Expand
Mining the inner structure of the Web graph
TLDR
It is found that the scale-free properties permeate all the components of the bow-tie which exhibit the same macroscopic properties as the Web graph itself, however, close inspection reveals that their inner structure is quite distinct. Expand
Stochastic analysis of web page ranking
TLDR
This thesis presents a new methodology for analyzing the probabilistic behavior of the PageRank distribution and the dependence between various power law parameters of the Web, based on the techniques from the theory of regular variations and the extreme value theory. Expand
Using PageRank to Characterize Web Structure
TLDR
This work studies the distribution of PageRank values (used in the Google search engine) on the Web, and develops detailed models for the Web graph that explain this observation, and remain faithful to previously studied degree distributions. Expand
Link Structure of Hierarchical Information Networks
One feature that seems to have been largely ignored in previous models of the Web is the inherent hierarchy that is evident in the structure of URLs. We provide evidence that this hierarchicalExpand
Modelling and simulation of the web graph: evaluating an exponential growth copying model
TLDR
The behaviour of the Exponential Growth Copying (EGC) model is evaluated, which has been explicitly designed to model the WWW, and the effect of individual parameters on its effectiveness through simulation modelling is analysed. Expand
Self-organization, Self-regulation, and Self-similarity on the Fractal Web
TLDR
The authors begin by modelling the World Wide Web as an ecosystem, which reflects an intimate coupling of people, programs, and pages that influences one another to yield an amazing array of self-organization, self-regulation, and self-similarity. Expand
Coarse-grained classification of web sites by their structural properties
TLDR
This paper identifies and analyzes structural properties which reflect the functionality of a Web site and introduces a content-independent approach for the automated coarse-grained classification of Web sites. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 85 REFERENCES
ParaSite: Mining Structural Information on the Web
TLDR
The varieties of link information (not just hyperlinks) on the Web, how the Web differs from conventional hypertext, and how the links can be exploited to build useful applications are discussed. Expand
Extracting Large-Scale Knowledge Bases from the Web
TLDR
This paper develops novel algorithms for enumerating and organizing all web occurrences of certain subgraphs that are signatures of web phenomena such as tightly-focused topic communities, webrings, taxonomy trees, keiretsus, etc, and argues that these algorithms run efficiently in this model. Expand
Silk from a sow's ear: extracting usable structures from the Web
TLDR
This paper presents the exploration into techniques that utilize both the topology and textual similarity between items as well as usage data collected by servers and page meta-information lke title and size. Expand
The Anatomy of a Large-Scale Hypertextual Web Search Engine
TLDR
This paper provides an in-depth description of Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and looks at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want. Expand
Graph structure in the Web
TLDR
The study of the web as a graph yields valuable insight into web algorithms for crawling, searching and community discovery, and the sociological phenomena which characterize its evolution. Expand
The Nature of Markets in the World Wide Web
Much has been said about the possibility that in the information age, ease of entry and global access will lead to market characteristics with few inefficiencies. While several arguments have beenExpand
Strong regularities in world wide web surfing
TLDR
A model that assumes that users make a sequence of decisions to proceed to another page, continuing as long as the value of the current page exceeds some threshold, yields the probability distribution for the number of pages that a user visits within a given Web site. Expand
Querying the World Wide Web
TLDR
This paper proposes a query language, WebSQL, that takes advantage of multiple index servers without requiring users to know about them, and that integrates textual retrieval with structure and topology-based queries. Expand
Stochastic models for the Web graph
TLDR
The results are two fold: it is shown that graphs generated using the proposed random graph models exhibit the statistics observed on the Web graph, and additionally, that natural graph models proposed earlier do not exhibit them. Expand
WebQuery: Searching and Visualizing the Web Through Connectivity
TLDR
This work examines links among the nodes returned in a keyword-based query, finding “interesting” sites that are highly connected to those sites returned by the original query by finding ‘hot spots’ on the Web that contain information germane to a user's query. Expand
...
1
2
3
4
5
...