Big data management: conception, technology and challenge
- X. Meng, X. Ci
- Research and Development of Computer,
The capacity of single server or CPU is unable to finish the task of the mining of mass data. In consideration of this bottleneck problem, a combined algorithm which is used by genetic and MR-based parallel clustering algorithm is proposed. To make up for the defects of clustering analysis in screening the clustering center, the clusters are used by genetic algorithm, relying on M-R parallel computing model to accelerate the convergence of the clustering analysis. To verify reasonableness of algorithm, this algorithm which is applied to analysis of the actual log is based on building of Hadoop platform. Experimental results show that relying on performance of distributed cluster computing and genetic clustering analysis to process log files can get better clustering results, and the efficiency of mining of massive log can be greatly improved.