SOF: a semi‐supervised ontology‐learning‐based focused crawler

@article{Dong2013SOFAS,
  title={SOF: a semi‐supervised ontology‐learning‐based focused crawler},
  author={Hai Dong and Farookh Khadeer Hussain},
  journal={Concurrency and Computation: Practice and Experience},
  year={2013},
  volume={25}
}
  • Hai DongF. Hussain
  • Published 25 August 2013
  • Computer Science
  • Concurrency and Computation: Practice and Experience
The rapid increase in the volume of data available on the Internet makes it increasingly impractical for a crawler to index the whole Web. Instead, many intelligent crawlers, known as ontology‐based semantic focused crawlers, have been designed by making use of Semantic Web technologies for topic‐centered Web information crawling. Ontologies, however, have constraints of validity and time, which may influence the performance of the crawlers. Ontology‐learning‐based focused crawlers are… 

Empirical analysis of domain ontology usage on the Web: eCommerce domain in focus

To comprehensively understand the usage patterns of conceptual knowledge, instance data, and ontology co‐usability, the GoodRelations ontology was considered as the domain ontology and a dataset was built by collecting structured data from 211 web‐based data sources that have published information using thedomain ontology.

A Framework for Ontology Learning from Taxonomic Data

The proposed system will deal with the taxonomic text available in agricultural system and will also enhance the algorithms thereby available and propose a framework for learning of the taxonomy text which will overcome the loopholes of ontology developed from generalized texts.

Ontology Learning for Systems Engineering Body of Knowledge

A formal and sophisticated system engineering ontology is achieved, which can be used to harmonize the extant standards, unify the languages, and improve the interoperability of the model-based systems engineering approach.

A Survey on Semantic Focused Crawler For Mining Service Information

This paper aims to survey the semantic focused crawler used to an extract and annotate the web pages that retrieved according to semantic web technology to overcome the three issues of heterogeneity, ubiquity and ambiguity.

Discovering Plain-Text-Described Services Based on Ontology Learning

An approach to efficiently discover domain-specific services that are described by plain text over the Internet that incorporates a plain-text-described service ontology for standard service description, a plaintext- described service discovery framework for domain-relevant service discovery and ontology learning, and a machine-learning-based model for ontologybased service functionality annotation.

Clustering-Based Topical Web Crawling for Topic-Specific Information Retrieval Guided by Incremental Classifier

  • T. PengLu Liu
  • Computer Science
    Int. J. Softw. Eng. Knowl. Eng.
  • 2015
A novel incremental method for Web page classification enhanced by link-contexts and clustering is presented, which outperforms the conventional topical Web crawler in Harvest rate and Target recall and increases the accuracy and efficiency of a classifier.

Process activity ontology learning from event logs through gamification

Evaluation of the approach to the construction of activity ontologies by 35 participants shows that they found the method engaging and that its application results in high-quality ontologies.

A survey of Web crawlers for information retrieval

This study follows the guidelines of systematic literature review and applies it to the field of Web crawling, calling for an increased awareness in various fields of the Web crawler and identify how techniques from other domains can be used for crawling the Web.

Towards extracting event-centric collections from Web archives

This article addresses the novel problem of extracting interlinked event-centric document collections from large-scale Web archives to facilitate an efficient and intuitive access to information regarding past events by developing a specialised extraction method that adapts focused crawling techniques to the Web archive settings.

Towards extracting event-centric collections from Web archives

This article addresses the novel problem of extracting interlinked event-centric document collections from large-scale Web archives to facilitate an efficient and intuitive access to information regarding past events by developing a specialised extraction method that adapts focused crawling techniques to the Web archive settings.

References

SHOWING 1-10 OF 28 REFERENCES

An efficient adaptive focused crawler based on ontology learning

An intelligent focused crawler algorithm in which ontology is embedded to evaluate the page's relevance to the topic and can evolve the ontology automatically during crawl process, compared with other algorithms using domain knowledge.

State of the Art in Semantic Focused Crawlers

The features of these semantic focused crawlers are concluded and the overall state of the art of this field is drawn by means of a multi-dimensional comparison.

Self-Adaptive Semantic Focused Crawler for Mining Services Information Discovery

This paper presents the framework of a novel self-adaptive semantic focused crawler - SASF crawler, with the purpose of precisely and efficiently discovering, formatting, and indexing mining service information over the Internet, by taking into account the three major issues.

Semantic Focused Crawling for Retrieving E-Commerce Information

This work presents a novel semantic approach for building an intelligent focused crawler which deals with evaluating the page’s content relevance to the E-commerce topic by the domain ontology and the hyperlinks connection to the commercial web pages by link analysis.

Ontology-focused crawling of Web documents

This paper proposes an approach for document discovery building on a comprehensive framework for ontology-focused crawling of Web documents that defines several relevance computation strategies and provides an empirical evaluation which has shown promising results.

Ontology-based Web crawler

The proposed new metric, association-metric, solves the major problem of finding the relevancy of the pages before the process of crawling, to an optimal level.

Ontology-Learning-Based Focused Crawling for Online Service Advertising Information Discovery and Classification

This paper proposes an ontology-learning-based focused crawling approach, enabling Web-crawler-based online service advertising information discovery and classification in the Web environment, by taking into account the characteristics of service Advertising information.

Focused Crawling for Automatic Service Discovery, Annotation, and Classification in Industrial Digital Ecosystems

This paper presents a conceptual framework for a semantic focused crawler, with the purpose of automatically discovering, annotating, and classifying the service information with the Semantic Web technologies.

A context‐aware semantic similarity model for ontology environments

This paper presents a solution for the two issues, including a novel ontology conversion process and a context‐aware semantic similarity model, by considering the factors of both the context of concepts and relations, and the ontology structure.