Trawling the Web for Emerging Cyber-Communities

  title={Trawling the Web for Emerging Cyber-Communities},
  author={Ravi Kumar and Prabhakar Raghavan and Sridhar Rajagopalan and Andrew Tomkins},
  journal={Comput. Networks},

Figures and Tables from this paper

. Paper Number : 284 Title Page A Study of the Structure of the Web
This paper analyzes link topologies of various communities, and patterns of mirroring of content, on 1997 and 1999 snapshots of the Web, to give insight into patterns of interaction within communities and how they evolve, as well as patterns of data replication.
Inferring web communities through relaxed cocitation and dense bipartite graphs
The results indicate that as compared to trawling approach, the proposed approach extracts community patterns of significantly large size and has good scaleup properties and can be easily parallelized.
Detection of Web Communities from Community Cores
Focusing on the issue of automatically ascertaining the ideal sizes of Web communities, a two-step heuristic algorithm is proposed to specify Web communities and demonstrates that the proposed algorithm is capable to effectively identify such communities that satisfy: (1) the relationships among the members of intra-communities are close; (2) the boundaries between the inter-Communities are sparse.
Extraction and classification of dense communities in the web
A new scalable algorithm for finding relatively dense subgraphs in massive graphs and a Community Watch system that clusters the communities found in the web-graph into homogeneous groups by topic and labelling each group by representative keywords.
Extraction and classification of dense implicit communities in the Web graph
The core of the contribution is a new scalable algorithm for finding relatively dense subgraphs in massive graphs and a complete Community Watch system by clustering the communities found in the Web graph into homogeneous groups by topic and labeling each group by representative keywords.
An approach to relate the Web communities through bipartite graphs
  • P. K. ReddyM. Kitsuregawa
  • Computer Science
    Proceedings of the Second International Conference on Web Information Systems Engineering
  • 2001
This paper investigates the problem of extracting and relating the web community structures from a large collection of Web-pages by performing hyper-link analysis and demonstrates that the proposed approach extracts meaningful community signatures and relates them.
Mining the inner structure of the Web graph
It is found that the scale-free properties permeate all the components of the bow-tie which exhibit the same macroscopic properties as the Web graph itself, however, close inspection reveals that their inner structure is quite distinct.
Partitioning of Web graphs by community topology
A stricter Web community definition is introduced to overcome boundary ambiguity of a Web community defined by Flake, Lawrence and Giles, and an efficient method of finding a subclass of communities among the sets partitioned by each of n-1 cuts represented by a Gomory-Hu tree is proposed.
Simulating the Webgraph: a comparative analysis of models
This work simulated several of these models and compared them against a 300-million-node sample of the Webgraph provided by the Stanford WebBase project, finding that the more random the model, the better the graph.
Bridge Bounding: A Local Approach for Efficient Community Discovery in Complex Networks
Bridge Bounding is presented, a local methodology for community detection, which explores the local network topology around a seed node in order to identify edges that act as boundaries to the local community.


Inferring Web communities from link topology
This investigation shows that although the process by which users of the Web create pages and links is very difficult to understand at a “local” level, it results in a much greater degree of orderly high-level structure than has typically been assumed.
Querying the World Wide Web
The authors propose a query language, WebSQL, that takes advantage of multiple index servers without requiring users to know about them, and that integrates textual retrieval with structure and topology-based queries.
ParaSite: mining the structural information on the World-Wide Web
A novel “just-intime” interpreter automatically retrieves information from the Web as implicitly demanded by user queries, a technique which could be applied not just to the Internet but to other sources of data too large to be precomputed into a database.
The Anatomy of a Large-Scale Hypertextual Web Search Engine
Silk from a sow's ear: extracting usable structures from the Web
This paper presents the exploration into techniques that utilize both the topology and textual similarity between items as well as usage data collected by servers and page meta-information lke title and size.
Database techniques for the World-Wide Web: a survey
The primary goal of this survey is to classify the different tasks to which database concepts have been applied, and to emphasize the technical innovations that were required to do so.
ParaSite: Mining Structural Information on the Web
Improved algorithms for topic distillation in a hyperlinked environment
This paper addresses the problem of topic distillation on the World Wide Web, namely, given a typical user query to find quality documents related to the query topic, by augmenting a previous connectivity analysis based algorithm with content analysis.
WebQuery: Searching and Visualizing the Web Through Connectivity