• Citations Per Year
Learn More
As the data volume increases, I/O bottleneck has become a great challenge for data analysis. Data compression can alleviate the bottleneck effectively. Taking K-means algorithm as an example, this paper proposes a compression-aware performance improvement model for big-data clustering. The model quantitatively analyzes the effect of a variety of factors(More)
Screening data for phylogenetic analysis from large datasets is a known computational problem of data-intensive application. In this paper, we implement a parallel approach, Cloud-GSQCT (Cloud Gene Sequence Quality Control Tool), to screen gene sequence data for phylogenetic analysis, using the MapReduce paradigm to parallelize the solution and to manage(More)
Scientific Data Grid (SDG) of Chinese Academy of Sciences aims at integrating distributed scientific data, providing transparent data access mechanism and efficient data analysis, processing and visualization services. SDG Job Scheduler (SDGJS) adopts an open and service-oriented framework. The scheduling policy of SDGJS considers both performance of(More)
As computing demands increase, the management and scheduling of a volume of jobs is becoming a significant challenge. This paper analyzes various data structures and algorithms used to manage job queue. Considering a fair and adjustable job scheduling, job scheduling algorithms are generally based on dynamic priorities of jobs. This paper analyzes three(More)
  • 1