S2S: structural-to-syntactic matching similar documents

@article{Aygn2007S2SSM,
  title={S2S: structural-to-syntactic matching similar documents},
  author={Ramazan Savas Ayg{\"u}n},
  journal={Knowledge and Information Systems},
  year={2007},
  volume={16},
  pages={303-329}
}
Management of large collection of replicated data in centralized or distributed environments is important for many systems that provide data mining, mirroring, storage, and content distribution. In its simplest form, the documents are generated, duplicated and updated by emails and web pages. Although redundancy may increase the reliability at a level, uncontrolled redundancy aggravates the retrieval performance and might be useless if the returned documents are obsolete. Document similarity… CONTINUE READING