Indexing by Latent Semantic Analysis

@article{Deerwester1990IndexingBL,
  title={Indexing by Latent Semantic Analysis},
  author={Scott C. Deerwester and Susan T. Dumais and George W. Furnas and Thomas K. Landauer and Richard A. Harshman},
  journal={Journal of the Association for Information Science and Technology},
  year={1990},
  volume={41},
  pages={391-407}
}
A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries. The particular technique used is singular-value decomposition, in which a large term by document matrix is decomposed into a set of ca. 100 orthogonal factors from which the original matrix can be… Expand

Paper Mentions

INFORMA TION RETRIEVAL USING LATENT SEMANTIC INDEXING
Our capabilities for collecting and storing data of all kinds are greater then ever. On the other side analyzing, summarizing and extracting information from this data is harder than ever. That's whyExpand
Low-rank Orthogonal Decompositions for Information Retrieval Applications
TLDR
The focus of this work is to demonstrate the computational advantages of exploiting low-rank orthogonal decompositions such as the ULV (or URV) as opposed to the truncated singular value decomposition (SVD) for the construction of initial and updated rank-k subspaces arising from LSI applications. Expand
Probabilistic latent semantic indexing
Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fitted from a trainingExpand
A Semantic Clustering Approach for Indexing Documents
TLDR
A semantic clustering approach to improve traditional information retrieval models by representing topics associated to documents by clusters terms, where each cluster is a set of related words according to the content of documents. Expand
Probabilistic Latent Semantic Indexing
Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fitted from a trainingExpand
Using latent semantic analysis to improve access to textual information
TLDR
Initial tests find this completely automatic method widely applicable and a promising way to improve users' access to many kinds of textual materials, or to objects and services for which textual descriptions are available. Expand
Latent Semantic Indexing and Information Retrieval: A Quest with Bosse
TLDR
This master thesis deals with the implementation of a search engine using Latent Semantic Indexing (LSI) called BoSSE, which allows a search for documents or terms similar to a given term, query or document. Expand
A Case Study of Latent Semantic Indexing
TLDR
Based on data collected from the usage of the system by graduate students and University of Tennessee library patrons, LSIRS is shown to be an effective and useful document retrieval system for both the inexperienced and advanced user. Expand
Improved Query Matching Using kd-Trees: A Latent Semantic Indexing Enhancement
TLDR
The kd-tree searching algorithm is used within a recent LSI implementation to reduce the time and computational complexity of query matching. Expand
An Information Retrieval Model Based on Latent Semantic Indexing with Intelligent Preprocessing
TLDR
An information retrieval system based on the proposed method using latent semantic indexing exhibits the superiority over other systems based on traditional preprocessing methods. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 25 REFERENCES
Classification Space: A Multivariate Procedure For Automatic? Document Indexing And Retrieval.
  • P. Ossorio
  • Computer Science, Medicine
  • Multivariate behavioral research
  • 1966
TLDR
A set of five related empirical studies provide convincing evidence that when appropriate experimental procedures are followed a very stable C-Space for a given content domain can be constructed on a surprisingly small data base. Expand
A theoretical basis for the use of co-occurence data in information retrieval
TLDR
This paper provides a foundation for a practical way of improving the effectiveness of an automatic retrieval system by measuring the extent of the dependence between index terms and using it to construct a non‐linear weighting function. Expand
The use of hierarchic clustering in information retrieval
TLDR
It is shown that cluster-based retrieval strategies can be devised which are as effective as linear associative retrieval strategies and much more efficient. Expand
A critical analysis of vector space model for information retrieval
Notations and definitions necessary to identify the concepts and relationships that are important in modelling information retrieval objects and processes in the context of vector spaces areExpand
Automatic Document Classification
TLDR
Of the ninety documents in the Validation Group which contained two or more clue words, and which could be automatically classified, 44 documents, or 48.9%, were placed into their correct categories by use of a computer formula. Expand
Relevance assessments and retrieval system evaluation
TLDR
It is found that large scale differences in the relevance assessments do not produce significant variations in average recall and precision, and it thus appears that properly computed recall and Precision data may represent effectiveness indicators which are generally valid for many distinct user classes. Expand
Subject access in online catalogs: A design model
TLDR
The proposed model is “wrapped around” existing Library of Congress subject-heading indexing in such a way as to enhance access greatly without requiring reindexing, and is argued that both for cost reasons and in principle this is a superior approach to other design philosophies. Expand
Disambiguation by short contexts
TLDR
This paper describes a technique that is of great help in many text-processing situations, and reports on an experiment recently conducted to test its validity and scope, namely that of disambiguation by short contexts. Expand
Experience with an adaptive indexing scheme
TLDR
Experience with an adaptive technique for constructing a rich, empirically defined, frequency weighted index for new or intermittent users of computer systems is discussed. Expand
Human factors and behavioral science: Statistical semantics: Analysis of the potential performance of key-word information systems
TLDR
This paper examines how imprecision in the way humans name things might limit how well a computer can guess to what they are referring and finds that hit rates could be increased threefold by using norms on naming to pick optimal names. Expand
...
1
2
3
...