The URL Search Strategy Based on the Content and Link Analysis

@article{Zhou2009TheUS,
  title={The URL Search Strategy Based on the Content and Link Analysis},
  author={Cailan Zhou and Xuan Sun and Hongjie Guo},
  journal={2009 International Conference on Computational Intelligence and Software Engineering},
  year={2009},
  pages={1-4}
}
The web information which influences the topic relevance of URL is analyzed based on the research of the search strategy about the crawler. On this basis, a new URL search algorithm based on the content and link analysis is supplied to us. The experimental results show that the algorithm not only can solve the problem of topic isolated island to increase recall, but also can avoid the phenomenon of the topic drift at the same 
1 Citations

References

SHOWING 1-10 OF 12 REFERENCES
Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery
TLDR
A new hypertext resource discovery system called a Focused Crawler that is robust against large perturbations in the starting set of URLs, and capable of exploring out and discovering valuable resources that are dozens of links away from the start set, while carefully pruning the millions of pages that may lie within this same radius. Expand
A General Evaluation Framework for Topical Crawlers
TLDR
A general framework to evaluate topical crawlers is presented and it is found that the proposed framework is effective at evaluating, comparing, differentiating and interpreting the performance of the four crawlers. Expand
Topical locality in the Web
TLDR
Empirically testing whether topical locality mirrors spatial locality of pages on the Web finds that the likelihood of linked pages having similar textual content to be high, and the similarity of sibling pages increases when the links from the parent are close together, show the foundations necessary for the success of many web systems. Expand
Building Domain-Specific Search Engines with Machine Learning Techniques
TLDR
New research in reinforcement learning, information extraction and text classification that enables efficient spidering, identifying informative text segments, and populating topic hierarchies is described. Expand
Modern Information Retrieval[M].Beijing:China Machine press
  • Modern Information Retrieval[M].Beijing:China Machine press
  • 2005
The Research Summary of Focused Crawler Technology
  • Computer Applications
  • 2005
Foeused Crawls, Tunneling
  • And Digital Libraries.in Proc.of the 6th European Conference on Digital Libraries,Rome,Italy,2002
  • 2002
Focused Crawling:A New Approach for Topic-Specific Resource Discovery,in
  • Proeeedings of the Eighth International World Wide Web Conference,
  • 1999
Focused Crawling:A New Approach for Topic
  • Specific Resource Discovery,in Proeeedings of the Eighth International World Wide Web Conference
  • 1999
Davison . Topical locality on the Web
  • 23 rd Annua International ACM SIGIR Conference on Research and Development in Information Retrieval
...
1
2
...