Indexing by Latent Semantic Analysis
@article{Deerwester1990IndexingBL, title={Indexing by Latent Semantic Analysis}, author={Scott C. Deerwester and Susan T. Dumais and Thomas K. Landauer and George W. Furnas and Richard A. Harshman}, journal={J. Am. Soc. Inf. Sci.}, year={1990}, volume={41}, pages={391-407} }
A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries. The particular technique used is singular-value decomposition, in which a large term by document matrix is decomposed into a set of ca. 100 orthogonal factors from which the original matrix can be…
7,383 Citations
Latent Semantic Indexing Via a Semi-Discrete Matrix Decomposition
- Computer Science
- 1999
This paper uses an alternate decomposition, the semi-discrete decomposition (SDD), and shows that for equal query times, the SDD does as well as the SVD and uses less than one-tenth the storage for the MEDLINE test set.
The limitation of the SVD for latent semantic indexing
- Computer Science2013 IEEE International Conference on Control System, Computing and Engineering
- 2013
It is shown that the LSI capability of the truncated SVD is not as conclusive as previously reported; rather it is a conditional aspect and when the condition is not met, then the truncation SVD can fail in recognizing the related documents resulting in a poor retrieval performance.
Downdating the Latent Semantic Indexing Model for Conceptual Information Retrieval
- Computer ScienceComput. J.
- 1998
Implementing the DRM method within LSI++ not only provides downdating functionality, but is less time consuming than recomputing the SVD when removing a term, document or both.
Evaluation of clustering and summarizing in distributed latent semantic indexing
- Computer Science2010 2nd IEEE International Conference on Information Management and Engineering
- 2010
This research did a research and implemented distributed LSI with the assumption of using distribution in order to decrease the required memory space and to reduce the run-time problem, and evaluation shows remarkable improvement in contrast with non-combinational LSI method.
A topic based indexing approach for searching in documents
- Computer Science2011 8th International Conference on Electrical Engineering, Computing Science and Automatic Control
- 2011
A topic based indexing approach to represent topics associated to documents, which is a document-topic matrix representation denoting the importance of topics inside documents, is proposed.
Probabilistic latent semantic indexing
- Computer ScienceSIGIR '99
- 1999
Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fitted from a training…
Probabilistic Latent Semantic Indexing
- Computer ScienceSIGIR Forum
- 2017
Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fitted from a training…
Using latent semantic analysis to improve access to textual information
- Computer ScienceCHI '88
- 1988
Initial tests find this completely automatic method widely applicable and a promising way to improve users' access to many kinds of textual materials, or to objects and services for which textual descriptions are available.
A Case Study of Latent Semantic Indexing
- Computer Science
- 1995
Based on data collected from the usage of the system by graduate students and University of Tennessee library patrons, LSIRS is shown to be an effective and useful document retrieval system for both the inexperienced and advanced user.
Improved Query Matching Using kd-Trees: A Latent Semantic Indexing Enhancement
- Computer ScienceInformation Retrieval
- 2004
The kd-tree searching algorithm is used within a recent LSI implementation to reduce the time and computational complexity of query matching.
References
SHOWING 1-10 OF 74 REFERENCES
Classification Space: A Multivariate Procedure For Automatic? Document Indexing And Retrieval.
- Computer ScienceMultivariate behavioral research
- 1966
A set of five related empirical studies provide convincing evidence that when appropriate experimental procedures are followed a very stable C-Space for a given content domain can be constructed on a surprisingly small data base.
Information Retrieval Based upon Latent Class Analysis
- Computer ScienceJACM
- 1962
An information retrieval based upon Lazarsfeld's latent class analysis is proposed, which has mathematical foundations and suggests that the mathematical rationale for the former could also provide a useful theoretical basis for the latter.
A theoretical basis for the use of co-occurence data in information retrieval
- Computer Science
- 1977
This paper provides a foundation for a practical way of improving the effectiveness of an automatic retrieval system by measuring the extent of the dependence between index terms and using it to construct a non‐linear weighting function.
A critical analysis of vector space model for information retrieval
- Environmental ScienceJ. Am. Soc. Inf. Sci.
- 1986
Notations and definitions necessary to identify the concepts and relationships that are important in modelling information retrieval objects and processes in the context of vector spaces are…
Automatic Document Classification
- Computer ScienceJACM
- 1963
Of the ninety documents in the Validation Group which contained two or more clue words, and which could be automatically classified, 44 documents, or 48.9%, were placed into their correct categories by use of a computer formula.
Subject access in online catalogs: A design model
- Computer ScienceJ. Am. Soc. Inf. Sci.
- 1986
The proposed model is “wrapped around” existing Library of Congress subject-heading indexing in such a way as to enhance access greatly without requiring reindexing, and is argued that both for cost reasons and in principle this is a superior approach to other design philosophies.
Disambiguation by short contexts
- Computer ScienceComput. Humanit.
- 1985
This paper describes a technique that is of great help in many text-processing situations, and reports on an experiment recently conducted to test its validity and scope, namely that of disambiguation by short contexts.
Experience with an adaptive indexing scheme
- Computer ScienceCHI '85
- 1985
Experience with an adaptive technique for constructing a rich, empirically defined, frequency weighted index for new or intermittent users of computer systems is discussed.