A comparison of techniques for estimating IDF values to generate lexical signatures for the web

Abstract

For bounded datasets such as the TREC Web Track the computation of term frequency (TF) and inverse document frequency (IDF) is not difficult. However, since IDF cannot be directly calculated for the entire web, it must be estimated. We see a need to estimate accurate IDF values to generate TF-IDF based lexical signatures (LSs) of web pages. Future… (More)
DOI: 10.1145/1458502.1458510

Topics

6 Figures and Tables

Cite this paper

@inproceedings{Klein2008ACO, title={A comparison of techniques for estimating IDF values to generate lexical signatures for the web}, author={Martin Klein and Michael L. Nelson}, booktitle={WIDM}, year={2008} }