Learn More
The emergence of grid as an infrastructure for sharing of large-scale resources increases the need for information services that allow an efficient management of resources. This paper proposes the Personalized and Semantics-based Grid Information Services (PS-GIS) that serve as information management units for service publishing, discovery and monitoring.(More)
To access the large-scale data sources efficiently and automatically, it is necessary to classify these data sources into different domains and categories. In this paper, we propose a novel classification approach to classify data sources into detail domain subjects by query probing. In our approach, we train sample instances for each subject category and(More)
This paper proposes a relation extraction model based on semantic pattern matching in Web environment. It consists of frequent pattern extraction, pattern clustering based on density, and pattern matching based on semantic similarity. First, based on the entities with known relations in a limited training set, we extract relation patterns containing these(More)
MapReduce has been proven to be a highly desirable platform for scalable parallel data analysis. The task scheduling in MapReduce is very crucial for the job execution and has a marked impact on the system performance. To the best of our knowledge, the previous scheduling algorithms rarely consider the job-intensive environments and are not able to provide(More)
Hadoop is widely deployed distributed computing framework and makes creating distributed applications much easier. However, unlike text data, there is no existing video r/w interface for Hadoop, and many existing video analytic applications implemented in C/C++ are not compatible with Hadoop framework. In this paper, we propose an open source Hadoop video(More)
Keyword query has attracted much research attention due to its simplicity and wide applications. The inherent ambiguity of keyword query is prone to unsatisfied query results. Moreover some existing techniques on Web query keyword query in relational databases and XML databases cannot be completely applied to keyword query in dataspaces. So we propose(More)
In Web database integration, crawling data pages is important for data extraction. The fact that data are contained by multiple result pages increases the difficulty of accessing data for integration. Thus, it is necessary to accurately and automatically crawl query result pages from Web database. To address this problem, we propose a novel approach based(More)
Deep Web sources contain a large of high-quality and query-related structured date. One of the challenges in the Deep Web is extracting result schemas of Deep Web sources. To address this challenge, this paper describes a novel approach that extracts both result data and the result schema of a Web database. The approach first models the query interface of a(More)