A vector space model for automatic indexing

  title={A vector space model for automatic indexing},
  author={G. Salton and A. Wong and C. Yang},
  journal={Commun. ACM},
In a document retrieval, or other pattern matching environment where stored entities (documents) are compared with each other or with incoming patterns (search requests), it appears that the best indexing (property) space is one where each entity lies as far away from the others as possible; in these circumstances the value of an indexing system may be expressible as a function of the density of the object space; in particular, retrieval performance may correlate inversely with space density. [...] Key Method An approach based on space density computations is used to choose an optimum indexing vocabulary for a collection of documents. Typical evaluation results are shown, demonstating the usefulness of the model.Expand
Toward conceptual indexing using automatic assignment of descriptors
The core of this system is described, the automatic descriptor assigner, which can be used to manage a collection of documents related to thesaurus, and user can manipulate them in a more conceptual way. Expand
Dynamic element retrieval in a structured environment
A method for the dynamic retrieval of XML elements, which requires only a single indexing of the documents at the level of the basic indexing node, is presented, which produces a rank ordered list of retrieved elements that is equivalent to the result produced by the same retrieval against an all-element index of the collection. Expand
Data structures for information retrieval
  • D. Nkweteyim
  • Computer Science
  • 2014 IST-Africa Conference Proceedings
  • 2014
The approach to constructing an index based on the vector-space model (VSM) is described and the results show that even with only a modest amount of main memory, large data sets such as the OHSUMED data set can be quickly indexed. Expand
Dynamic Element Retrieval in the Wikipedia Collection
The successful adaptation of the methodology for the dynamic retrieval of XML elements to a semi-structured environment and basic functions are performed using the Smart experimental retrieval system are described. Expand
A hybrid model for document retrieval systems
A methodology for the design of document retrieval systems is presented and a composite retrieval model is proposed to process a user's information request in a weighted Phrase-Oriented Fixed-Level Expression (POFLE), which may apply more than Boolean operators. Expand
An Indexing Matrix Based Retrieval Model
This work proposes a retrieval method which is based on an indexing matrix, which can get a better result than the traditional ways and the time cost of this method is much less than the standard retrieval method. Expand
Term proximity in document retrieval systems
The obtained results show a remarkable improvement in the relevance due to the use of the neighborhood of the terms, and this hasn't influence on the indexing and research time that stay so quick. Expand
Information retrieval is the process of evaluating a user's query, or information need, against a set of documents (books, journal articles, web pages, etc.) to determine which of the documentsExpand
A topic based indexing approach for searching in documents
A topic based indexing approach to represent topics associated to documents, which is a document-topic matrix representation denoting the importance of topics inside documents, is proposed. Expand
Information Retrieval Systems
The major classes (or models) of retrieval algorithms (Boolean, vector, and probabilistic) are described along with formal definitions of the basic form of these algorithms and some of the variations in common use in IR research. Expand


Contribution to the Theory of Indexing
An attempt is made to characterize the usefulness of terms occurring in stored documents and user queries as a function of their frequency characteristics across the documents of a collection, and an indexing theory is described based on term frequency considerations. Expand
On the Specification of Term Values in Automatic Indexing
It is shown that the standard theories for the specification of term values (or weights) are not adequate, and new techniques are introduced for the assignment of weights to index terms, based on the characteristics of individual document collections. Expand
A statistical interpretation of term specificity and its application in retrieval
It is argued that terms should be weighted according to collection frequency, so that matches on less frequent, more specific, terms are of greater value than matches on frequent terms. Expand
A theory of indexing
  • G. Salton
  • Computer Science
  • Regional conference series in applied mathematics
  • 1975
An investigation of the effects of different indexing methods on the document space configuration
  • Computer Sci. Dep
  • 1974
Real-time document retrieval
  • Ph.D. Th., Computer Sci. Dep., Cornell U
  • 1974
Automatic btformation Organiza;ion and Retrieval
  • Automatic btformation Organiza;ion and Retrieval
  • 1968