Indexing by Latent Semantic Analysis

  title={Indexing by Latent Semantic Analysis},
  author={Scott C. Deerwester and Susan T. Dumais and Thomas K. Landauer and George W. Furnas and Richard A. Harshman},
  journal={J. Am. Soc. Inf. Sci.},
A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries. The particular technique used is singular-value decomposition, in which a large term by document matrix is decomposed into a set of ca. 100 orthogonal factors from which the original matrix can be… 

Figures and Tables from this paper

Latent Semantic Indexing Via a Semi-Discrete Matrix Decomposition
This paper uses an alternate decomposition, the semi-discrete decomposition (SDD), and shows that for equal query times, the SDD does as well as the SVD and uses less than one-tenth the storage for the MEDLINE test set.
The limitation of the SVD for latent semantic indexing
  • Andri Mirzal
  • Computer Science
    2013 IEEE International Conference on Control System, Computing and Engineering
  • 2013
It is shown that the LSI capability of the truncated SVD is not as conclusive as previously reported; rather it is a conditional aspect and when the condition is not met, then the truncation SVD can fail in recognizing the related documents resulting in a poor retrieval performance.
Downdating the Latent Semantic Indexing Model for Conceptual Information Retrieval
Implementing the DRM method within LSI++ not only provides downdating functionality, but is less time consuming than recomputing the SVD when removing a term, document or both.
Evaluation of clustering and summarizing in distributed latent semantic indexing
This research did a research and implemented distributed LSI with the assumption of using distribution in order to decrease the required memory space and to reduce the run-time problem, and evaluation shows remarkable improvement in contrast with non-combinational LSI method.
A topic based indexing approach for searching in documents
A topic based indexing approach to represent topics associated to documents, which is a document-topic matrix representation denoting the importance of topics inside documents, is proposed.
Probabilistic latent semantic indexing
Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fitted from a training
Probabilistic Latent Semantic Indexing
Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fitted from a training
Using latent semantic analysis to improve access to textual information
Initial tests find this completely automatic method widely applicable and a promising way to improve users' access to many kinds of textual materials, or to objects and services for which textual descriptions are available.
A Case Study of Latent Semantic Indexing
Based on data collected from the usage of the system by graduate students and University of Tennessee library patrons, LSIRS is shown to be an effective and useful document retrieval system for both the inexperienced and advanced user.
Improved Query Matching Using kd-Trees: A Latent Semantic Indexing Enhancement
The kd-tree searching algorithm is used within a recent LSI implementation to reduce the time and computational complexity of query matching.


Classification Space: A Multivariate Procedure For Automatic? Document Indexing And Retrieval.
  • P. Ossorio
  • Computer Science
    Multivariate behavioral research
  • 1966
A set of five related empirical studies provide convincing evidence that when appropriate experimental procedures are followed a very stable C-Space for a given content domain can be constructed on a surprisingly small data base.
Information Retrieval Based upon Latent Class Analysis
An information retrieval based upon Lazarsfeld's latent class analysis is proposed, which has mathematical foundations and suggests that the mathematical rationale for the former could also provide a useful theoretical basis for the latter.
A theoretical basis for the use of co-occurence data in information retrieval
This paper provides a foundation for a practical way of improving the effectiveness of an automatic retrieval system by measuring the extent of the dependence between index terms and using it to construct a non‐linear weighting function.
The use of hierarchic clustering in information retrieval
A critical analysis of vector space model for information retrieval
Notations and definitions necessary to identify the concepts and relationships that are important in modelling information retrieval objects and processes in the context of vector spaces are
Automatic Document Classification
Of the ninety documents in the Validation Group which contained two or more clue words, and which could be automatically classified, 44 documents, or 48.9%, were placed into their correct categories by use of a computer formula.
Relevance assessments and retrieval system evaluation
Subject access in online catalogs: A design model
  • M. Bates
  • Computer Science
    J. Am. Soc. Inf. Sci.
  • 1986
The proposed model is “wrapped around” existing Library of Congress subject-heading indexing in such a way as to enhance access greatly without requiring reindexing, and is argued that both for cost reasons and in principle this is a superior approach to other design philosophies.
Disambiguation by short contexts
This paper describes a technique that is of great help in many text-processing situations, and reports on an experiment recently conducted to test its validity and scope, namely that of disambiguation by short contexts.
Experience with an adaptive indexing scheme
Experience with an adaptive technique for constructing a rich, empirically defined, frequency weighted index for new or intermittent users of computer systems is discussed.