SOF: a semi‐supervised ontology‐learning‐based focused crawler

@article{Dong2013SOFAS,
  title={SOF: a semi‐supervised ontology‐learning‐based focused crawler},
  author={Hai Dong and F. Hussain},
  journal={Concurrency and Computation: Practice and Experience},
  year={2013},
  volume={25}
}
  • Hai Dong, F. Hussain
  • Published 2013
  • Computer Science
  • Concurrency and Computation: Practice and Experience
The rapid increase in the volume of data available on the Internet makes it increasingly impractical for a crawler to index the whole Web. Instead, many intelligent crawlers, known as ontology‐based semantic focused crawlers, have been designed by making use of Semantic Web technologies for topic‐centered Web information crawling. Ontologies, however, have constraints of validity and time, which may influence the performance of the crawlers. Ontology‐learning‐based focused crawlers are… Expand
Empirical analysis of domain ontology usage on the Web: eCommerce domain in focus
TLDR
To comprehensively understand the usage patterns of conceptual knowledge, instance data, and ontology co‐usability, the GoodRelations ontology was considered as the domain ontology and a dataset was built by collecting structured data from 211 web‐based data sources that have published information using thedomain ontology. Expand
Query-driven approach of contextual ontology module learning using web snippets
TLDR
This work proposes an approach of contextual ontology module learning covering particular search terms by analyzing past user queries and by searching for web snippets provided by the traditional search engines. Expand
A Framework for Ontology Learning from Taxonomic Data
TLDR
The proposed system will deal with the taxonomic text available in agricultural system and will also enhance the algorithms thereby available and propose a framework for learning of the taxonomy text which will overcome the loopholes of ontology developed from generalized texts. Expand
Ontology Learning for Systems Engineering Body of Knowledge
TLDR
A formal and sophisticated system engineering ontology is achieved, which can be used to harmonize the extant standards, unify the languages, and improve the interoperability of the model-based systems engineering approach. Expand
A Survey on Semantic Focused Crawler For Mining Service Information
TLDR
This paper aims to survey the semantic focused crawler used to an extract and annotate the web pages that retrieved according to semantic web technology to overcome the three issues of heterogeneity, ubiquity and ambiguity. Expand
Discovering Plain-Text-Described Services Based on Ontology Learning
TLDR
An approach to efficiently discover domain-specific services that are described by plain text over the Internet that incorporates a plain-text-described service ontology for standard service description, a plaintext- described service discovery framework for domain-relevant service discovery and ontology learning, and a machine-learning-based model for ontologybased service functionality annotation. Expand
Clustering-Based Topical Web Crawling for Topic-Specific Information Retrieval Guided by Incremental Classifier
  • T. Peng, Lu Liu
  • Computer Science
  • Int. J. Softw. Eng. Knowl. Eng.
  • 2015
TLDR
A novel incremental method for Web page classification enhanced by link-contexts and clustering is presented, which outperforms the conventional topical Web crawler in Harvest rate and Target recall and increases the accuracy and efficiency of a classifier. Expand
A survey of Web crawlers for information retrieval
TLDR
This study follows the guidelines of systematic literature review and applies it to the field of Web crawling, calling for an increased awareness in various fields of the Web crawler and identify how techniques from other domains can be used for crawling the Web. Expand
Towards extracting event-centric collections from Web archives
TLDR
This article addresses the novel problem of extracting interlinked event-centric document collections from large-scale Web archives to facilitate an efficient and intuitive access to information regarding past events by developing a specialised extraction method that adapts focused crawling techniques to the Web archive settings. Expand
Analysing and Enriching Focused Semantic Web Archives for Parliament Applications
The web and the social web play an increasingly important role as an information source for Members of Parliament and their assistants, journalists, political analysts and researchers. It providesExpand
...
1
2
...

References

SHOWING 1-10 OF 28 REFERENCES
An ontology-based approach to learnable focused crawling
TLDR
Experimental results show that the proposed learnable focused crawling framework based on ontology outperforms the breadth-first search crawling approach, the simple keyword-based crawling approaches, the ANN-based focused crawl approach, and the focused crawling approach that uses only a domain-specific ontology. Expand
An efficient adaptive focused crawler based on ontology learning
TLDR
An intelligent focused crawler algorithm in which ontology is embedded to evaluate the page's relevance to the topic and can evolve the ontology automatically during crawl process, compared with other algorithms using domain knowledge. Expand
State of the Art in Semantic Focused Crawlers
TLDR
The features of these semantic focused crawlers are concluded and the overall state of the art of this field is drawn by means of a multi-dimensional comparison. Expand
Self-Adaptive Semantic Focused Crawler for Mining Services Information Discovery
TLDR
This paper presents the framework of a novel self-adaptive semantic focused crawler - SASF crawler, with the purpose of precisely and efficiently discovering, formatting, and indexing mining service information over the Internet, by taking into account the three major issues. Expand
Semantic Focused Crawling for Retrieving E-Commerce Information
TLDR
This work presents a novel semantic approach for building an intelligent focused crawler which deals with evaluating the page’s content relevance to the E-commerce topic by the domain ontology and the hyperlinks connection to the commercial web pages by link analysis. Expand
Ontology-focused crawling of Web documents
TLDR
This paper proposes an approach for document discovery building on a comprehensive framework for ontology-focused crawling of Web documents that defines several relevance computation strategies and provides an empirical evaluation which has shown promising results. Expand
Ontology-based Web crawler
TLDR
The proposed new metric, association-metric, solves the major problem of finding the relevancy of the pages before the process of crawling, to an optimal level. Expand
Ontology-Learning-Based Focused Crawling for Online Service Advertising Information Discovery and Classification
TLDR
This paper proposes an ontology-learning-based focused crawling approach, enabling Web-crawler-based online service advertising information discovery and classification in the Web environment, by taking into account the characteristics of service Advertising information. Expand
Focused Crawling for Automatic Service Discovery, Annotation, and Classification in Industrial Digital Ecosystems
TLDR
This paper presents a conceptual framework for a semantic focused crawler, with the purpose of automatically discovering, annotating, and classifying the service information with the Semantic Web technologies. Expand
A context‐aware semantic similarity model for ontology environments
TLDR
This paper presents a solution for the two issues, including a novel ontology conversion process and a context‐aware semantic similarity model, by considering the factors of both the context of concepts and relations, and the ontology structure. Expand
...
1
2
3
...