A Method for Finding Similar Documents Relying on Adding Repetition of Symbols in Length Based Filtering

Abstract

A basic topic in mining of massive dataset is finding similar items. As an example, finding similar documents can be recommended. In this case many methods are existed. For example, Shingling method and length based filtering are one of them. In Shingling method, from each document, substrings have been selected with symbol name and, they are placed on one… (More)

Topics

2 Figures and Tables

Cite this paper

@article{Azgomi2014AMF, title={A Method for Finding Similar Documents Relying on Adding Repetition of Symbols in Length Based Filtering}, author={Hossein Azgomi and Masumeh Ghasemi Mahsayeh and Masoud Mohammadi and Milad Moradi}, journal={CoRR}, year={2014}, volume={abs/1712.03190} }