Yue Kou

Learn More
MapReduce has been proven to be a highly desirable platform for scalable parallel data analysis. The task scheduling in MapReduce is very crucial for the job execution and has a marked impact on the system performance. To the best of our knowledge, the previous scheduling algorithms rarely consider the job-intensive environments and are not able to provide(More)
Keyword query has attracted much research attention due to its simplicity and wide applications. The inherent ambiguity of keyword query is prone to unsatisfied query results. Moreover some existing techniques on Web query keyword query in relational databases and XML databases cannot be completely applied to keyword query in dataspaces. So we propose(More)
To access the large-scale data sources efficiently and automatically, it is necessary to classify these data sources into different domains and categories. In this paper, we propose a novel classification approach to classify data sources into detail domain subjects by query probing. In our approach, we train sample instances for each subject category and(More)
In Web database integration, crawling data pages is important for data extraction. The fact that data are contained by multiple result pages increases the difficulty of accessing data for integration. Thus, it is necessary to accurately and automatically crawl query result pages from Web database. To address this problem, we propose a novel approach based(More)
Entities often hold more than one representation with some expressive errors in different data sources in the real world. Different representations and a few possible expressive errors make entities identifying a crucial task in data integration and data cleaning, which is known as entity resolution. We propose a novel approach for entity resolution using(More)
Hadoop is widely deployed distributed computing framework and makes creating distributed applications much easier. However, unlike text data, there is no existing video r/w interface for Hadoop, and many existing video analytic applications implemented in C/C++ are not compatible with Hadoop framework. In this paper, we propose an open source Hadoop video(More)
The emergence of grid as an infrastructure for sharing of large-scale resources increases the need for information services that allow an efficient management of resources. This paper proposes the Personalized and Semantics-based Grid Information Services (PS-GIS) that serve as information management units for service publishing, discovery and monitoring.(More)