A Length-variable Feature Code Based Fuzzy Duplicates Elimination Approach for Large Scale Chinese WebPages

@article{Guo2012ALF,
  title={A Length-variable Feature Code Based Fuzzy Duplicates Elimination Approach for Large Scale Chinese WebPages},
  author={Hongzhi Guo and Qingcai Chen and Cong Xin and Xiaolong Wang},
  journal={JSW},
  year={2012},
  volume={7},
  pages={2622-2629}
}
Most of the existing Chinese webpage duplicate elimination approaches do not focus on noisy and fuzzy duplicates elimination. In this paper, we propose an efficient and noise-tolerant Chinese webpage duplicate elimination approach based on Length-variable Feature Code. First, an Independent Extraction Unit is defined to eliminate the impact of short… CONTINUE READING