A vector space model for automatic indexing

@article{Salton1975AVS,
  title={A vector space model for automatic indexing},
  author={Gerard Salton and A. Wong and Chung-Shu Yang},
  journal={Commun. ACM},
  year={1975},
  volume={18},
  pages={613-620}
}
In a document retrieval, or other pattern matching environment where stored entities (documents) are compared with each other or with incoming patterns (search requests), it appears that the best indexing (property) space is one where each entity lies as far away from the others as possible; in these circumstances the value of an indexing system may be expressible as a function of the density of the object space; in particular, retrieval performance may correlate inversely with space density. [] Key Method An approach based on space density computations is used to choose an optimum indexing vocabulary for a collection of documents. Typical evaluation results are shown, demonstating the usefulness of the model.

Figures and Tables from this paper

Dynamic element retrieval in a structured environment

TLDR
A method for the dynamic retrieval of XML elements, which requires only a single indexing of the documents at the level of the basic indexing node, is presented, which produces a rank ordered list of retrieved elements that is equivalent to the result produced by the same retrieval against an all-element index of the collection.

Toward conceptual indexing using automatic assignment of descriptors

TLDR
The core of this system is described, the automatic descriptor assigner, which can be used to manage a collection of documents related to thesaurus, and user can manipulate them in a more conceptual way.

Data structures for information retrieval

TLDR
The approach to constructing an index based on the vector-space model (VSM) is described and the results show that even with only a modest amount of main memory, large data sets such as the OHSUMED data set can be quickly indexed.

Dynamic Element Retrieval in the Wikipedia Collection

TLDR
The successful adaptation of the methodology for the dynamic retrieval of XML elements to a semi-structured environment and basic functions are performed using the Smart experimental retrieval system are described.

A Hybrid Model for Document Retrieval Systems.

TLDR
A methodology for the design of document retrieval systems is presented and a composite retrieval model is proposed to process a user's information request in a weighted Phrase-Oriented Fixed-Level Expression (POFLE), which may apply more than Boolean operators.

NEW INFORMATION RETRIEVAL APPROACH BASED ON SEMANTIC INDEXING BY MEANING

TLDR
A new approach of semantic indexation allowing to lead to the exact meaning of each term in a document or query undergoing a contextual analysis at the sentence level is suggested, which indicates the efficacy of this hypothesis compared to traditional IR approaches.

An Indexing Matrix Based Retrieval Model

TLDR
This work proposes a retrieval method which is based on an indexing matrix, which can get a better result than the traditional ways and the time cost of this method is much less than the standard retrieval method.

Term proximity in document retrieval systems

TLDR
The obtained results show a remarkable improvement in the relevance due to the use of the neighborhood of the terms, and this hasn't influence on the indexing and research time that stay so quick.

PROBLEM 4 : TERM WEIGHTING SCHEMES IN INFORMATION RETRIEVAL

TLDR
A speciic term weighting scheme (log-entropy weighting) is studied to determine its eeectiveness on diierent aspects of retrieval to improve information retrieval accuracy.

APPROXIMATING VECTORS FOR SIMILARITY SEARCHING

TLDR
This paper presents a measure that, given one assumption, is suitable for judging the quality of approximations of vectors and is used in the design of a heuristic algorithm that creates approximation of vectors.
...

References

SHOWING 1-6 OF 6 REFERENCES

Contribution to the Theory of Indexing

TLDR
An attempt is made to characterize the usefulness of terms occurring in stored documents and user queries as a function of their frequency characteristics across the documents of a collection, and an indexing theory is described based on term frequency considerations.

On the Specification of Term Values in Automatic Indexing

TLDR
It is shown that the standard theories for the specification of term values (or weights) are not adequate, and new techniques are introduced for the assignment of weights to index terms, based on the characteristics of individual document collections.

A theory of indexing

  • G. Salton
  • Computer Science
    Regional conference series in applied mathematics
  • 1975

An investigation of the effects of different indexing methods on the document space configuration

  • Computer Sci. Dep
  • 1974

Automatic btformation Organiza;ion and Retrieval

  • Automatic btformation Organiza;ion and Retrieval
  • 1968