MapDupReducer: detecting near duplicates over massive datasets

  title={MapDupReducer: detecting near duplicates over massive datasets},
  author={Changping Wang and Jianmin Wang and Xuemin Lin and Wei Wang and Haixun Wang and Hongsong Li and Wanpeng Tian and Jun Xu and Rui Li},
  booktitle={SIGMOD Conference},
Near duplicate detection benefits many applications, e.g., on-line news selection over the Web by keyword search. The purpose of this demo is to show the design and implementation of MapDupReducer, a MapReduce based system capable of detecting near duplicates over massive datasets efficiently. 
Highly Cited
This paper has 64 citations. REVIEW CITATIONS

From This Paper

Figures, tables, and topics from this paper.


Publications citing this paper.
Showing 1-10 of 36 extracted citations

64 Citations

Citations per Year
Semantic Scholar estimates that this publication has 64 citations based on the available data.

See our FAQ for additional information.

Similar Papers

Loading similar papers…