Empirical evaluation of the link and content-based focused Treasure-Crawler

@article{Seyfi2013EmpiricalEO,
  title={Empirical evaluation of the link and content-based focused Treasure-Crawler},
  author={Ali Seyfi and Ahmed Patel and Joaquim Celestino},
  journal={Comput. Stand. Interfaces},
  year={2013},
  volume={44},
  pages={54-62}
}

Focused crawling of online business Web pages using latent semantic indexing approach

A new model for online business text crawling which seeks, acquires, maintains and filter business pages, which is guided by a latent semantic index and information from Word Net (business filter) which learns to recognize the relevance of a web page with respect to the business topic.

A Survey about Algorithms Utilized by Focused Web Crawler

It is demonstrated that the popular algorithms utilized at the process of focused web crawling, basically refer to webpage analyzing algorithms and crawling strategies (prioritize the uniform resource locator (URLs) in the queue).

A focused crawler combinatory link and content model based on T-Graph principles

  • Ali SeyfiAhmed Patel
  • Computer Science
    Comput. Stand. Interfaces
  • 2016

Design and implementation of the patent topical web crawler system

The overall design and workflow of the patent topical crawler is described, including the basic functional architecture and key system technologies; the patent short text similarity calculation method based on Doc2Vec for the relevance discrimination of patent topic is proposed, which can effectively screen the required patent data.

Design and implementation of the patent topical web crawler system

The overall design and workflow of the patent topical crawler is described, including the basic functional architecture and key system technologies; the patent short text similarity calculation method based on Doc2Vec for the relevance discrimination of patent topic is proposed, which can effectively screen the required patent data.

Intelligent rule-based approach for effective information retrieval and dynamic storage in local repositories

Two new algorithms called intelligent rule-based relevant information retrieval algorithm with semantics and a secured information storage using semantic knowledge representation are proposed in this paper for effectively retrieving the e-learning contents from the Web on computer science subject and to store them in local repositories with semantic indexing.

DynWebStats: A Framework for Determining Dynamic and Up-to-date Web Indicators

This paper presents a new methodology for generating dynamic Web indicators, which consider Web pages changes, both in terms of its modifications and its creation or deletion, and offers a measure of the quality of the indicators.

Malicious attacks on the web and crawling of information data by Python technology

Applying Python can write crawler programs in a relatively simple way and embed a hacker attack program to crawl the hidden information on the web.

References

SHOWING 1-10 OF 35 REFERENCES

Evaluation Methods for Focused Crawling

This paper studies three different evaluation functions for predicting the relevance of a hyperlink with respect to the target topic and introduces a method that combines both the anchor and the whole parent document, using a Bayesian representation of the Webg raph structure.

Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery

Topic-specific crawling on the Web with the measurements of the relevancy context graph

LSCrawler: A Framework for an Enhanced Focused Web Crawler Based on Link Semantics

  • M. YuvaraniN. IyengarA. Kannan
  • Computer Science
    2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06)
  • 2006
A novel, and distinctive focused crawler named LSCrawler has been proposed, which retrieves documents by speculating the relevancy of the document based on the keywords in the link and the surrounding text of the link.

A Method for Focused Crawling Using Combination of Link Structure and Content Similarity

A new hybride focused crawler is introduced, which uses link structure of documents as well as similarity of pages to the topic to crawl the Web.

Focused Crawling Using Context Graphs

A focused crawling algorithm is presented that builds a model for the context within which topically relevant pages occur on the web that can capture typical link hierarchies within which valuable pages occur, as well as model content on documents that frequently cooccur with relevant pages.

A General Evaluation Framework for Topical Crawlers

A general framework to evaluate topical crawlers is presented and it is found that the proposed framework is effective at evaluating, comparing, differentiating and interpreting the performance of the four crawlers.

An architecture for a focused trend parallel Web crawler with the application of clickstream analysis

HAWK: A Focused Crawler with Content and Link Analysis

A focused crawler that not only uses content of web page to improve page relevance, but also uses link structure to improve the coverage of a specific topic.

Intelligent focused crawler: Learning which links to crawl

This study combines a Naïve Bayes classifier for classification of URLs with a simple URL scoring optimization to improve the system performance and demonstrates that proposed approach performs better.