Automatic Indexing: An Experimental Inquiry

@article{Maron1961AutomaticIA,
  title={Automatic Indexing: An Experimental Inquiry},
  author={M. E. Maron},
  journal={J. ACM},
  year={1961},
  volume={8},
  pages={404-417}
}
  • M. Maron
  • Published 1 July 1961
  • Computer Science
  • J. ACM
This inquiry examines a technique for automatically classifying (indexing) documents according to their subject content. The task, in essence, is to have a computing machine read a document and on the basis of the occurrence of selected clue words decide to which of many subject categories the document in question belongs. This paper describes the design, execution and evaluation of a modest experimental study aimed at testing empirically one statistical technique for automatic indexing. 

Tables from this paper

A discriminant method for automatically classifying documents
TLDR
A continuing effort within IBM devoted to developing and testing statistical techniques for automatically classifying documents, through analyses of documents from the fields of psychology, law, computers and international relations, is discussed.
Training a computer to assign descriptors to documents: experiments in automatic indexing
TLDR
Pioneering use of computers for this purpose, by Luhn and Baxendale, has been followed by the development of a number of KWIC (keyword-in-context) and similar programs.
A theory of term importance in automatic text analysis
TLDR
Most existing automatic content analysis and indexing techniques are based on word frequency characteristics applied largely in an ad hoc manner, but terms exhibiting high occurence frequencies in individual documents are often useful for high recall performance, whereas terms with low frequency in the whole collection are useful forhigh precision.
Computer-aided Indexing System
TLDR
The Computer-Aided Document Indexing System CADIS is developed that applies controlled vocabulary keywords from the EUROVOC thesaurus that copes with the morphological complexity of the Croatian language.
Computer aided document indexing system
TLDR
The main contribution of this paper is the introduction of the special CADIS internal data structure that copes with the morphological complexity of the Croatian language and ensures efficient statistical analysis of input documents and quick visual feedback generation.
Computer-Aided Document Indexing System
TLDR
The main contribution of this paper is the introduction of the special CADIS internal data structure that copes with the morphological complexity of the Croatian language and ensures efficient statistical analysis of input documents and quick visual feedback generation.
Automatic Document Classification Part II . Additional Experiments
TLDR
It is concluded that, while there is no significant difference in the predictive efficiency between the Bayesian and the Factor Score methods, automatic document classification is enhanced by the use of a factor-analytically-derived classification schedule.
Is Automatic Classification a Reasonable Application of Statistical Analysis of Text?
TLDR
The crucial question of the quality of automatic classification is treated at considerable length, and empirical data are introduced to support the hypothesis that classification quality improves as more information about each document is used for input to the classification program.
Rapid pre-indexing by machine.
TLDR
A new method of subject indexing by machine for documents in the Project INTREX catalog is developed to allow new documents to be placed online quickly in the computer-stored Intres catalog.
...
...

References

On Relevance, Probabilistic Indexing and Information Retrieval
TLDR
The paper suggests an interpretation of the whole library problem as one where the request is considered as a clue on the basis of which the library system makes a concatenated statistical inference in order to provide as an output an ordered list of those documents which most probably satisfy the information needs of the user.