Probabilistic Latent Semantic Indexing Proceedings of the Twenty-Second Annual International SIGIR Conference on Research and Development in Information Retrieval

Abstract

Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fitted from a training corpus of text documents by a generalization of the Expectation Maximization algorithm, the utilized model is able to deal with domainnspecic synonymy as well as with polysemous words. In contrast to standard Latent Semantic Indexing LSI by Singular Value Decomposition , the probabilistic variant has a solid statistical foundation and deenes a proper generative data model. Retrieval experiments on a number of test collections indicate substantial performance gains over direct term matching meth-o d s a s w ell as over LSI. In particular, the combination of models with diierent dimensionalities has proven to be advantageous .

7 Figures and Tables

Showing 1-10 of 92 extracted citations

Statistics

01020'02'04'06'08'10'12'14'16
Citations per Year

157 Citations

Semantic Scholar estimates that this publication has received between 112 and 224 citations based on the available data.

See our FAQ for additional information.