Automatic Indexing: An Experimental Inquiry
@article{Maron1961AutomaticIA, title={Automatic Indexing: An Experimental Inquiry}, author={M. E. Maron}, journal={J. ACM}, year={1961}, volume={8}, pages={404-417} }
This inquiry examines a technique for automatically classifying (indexing) documents according to their subject content. The task, in essence, is to have a computing machine read a document and on the basis of the occurrence of selected clue words decide to which of many subject categories the document in question belongs. This paper describes the design, execution and evaluation of a modest experimental study aimed at testing empirically one statistical technique for automatic indexing.
Tables from this paper
522 Citations
A discriminant method for automatically classifying documents
- Computer ScienceAFIPS '63 (Fall)
- 1963
A continuing effort within IBM devoted to developing and testing statistical techniques for automatically classifying documents, through analyses of documents from the fields of psychology, law, computers and international relations, is discussed.
Training a computer to assign descriptors to documents: experiments in automatic indexing
- EconomicsAFIPS '64 (Spring)
- 1964
Pioneering use of computers for this purpose, by Luhn and Baxendale, has been followed by the development of a number of KWIC (keyword-in-context) and similar programs.
A theory of term importance in automatic text analysis
- Computer ScienceJ. Am. Soc. Inf. Sci.
- 1975
Most existing automatic content analysis and indexing techniques are based on word frequency characteristics applied largely in an ad hoc manner, but terms exhibiting high occurence frequencies in individual documents are often useful for high recall performance, whereas terms with low frequency in the whole collection are useful forhigh precision.
Computer-aided Indexing System
- Computer Science
- 2005
The Computer-Aided Document Indexing System CADIS is developed that applies controlled vocabulary keywords from the EUROVOC thesaurus that copes with the morphological complexity of the Croatian language.
Computer aided document indexing system
- Computer Science27th International Conference on Information Technology Interfaces, 2005.
- 2005
The main contribution of this paper is the introduction of the special CADIS internal data structure that copes with the morphological complexity of the Croatian language and ensures efficient statistical analysis of input documents and quick visual feedback generation.
Computer-Aided Document Indexing System
- Computer ScienceJ. Comput. Inf. Technol.
- 2005
The main contribution of this paper is the introduction of the special CADIS internal data structure that copes with the morphological complexity of the Croatian language and ensures efficient statistical analysis of input documents and quick visual feedback generation.
Term association analysis on a large file of bibliographic data, using a highly-controlled indexing vocabulary
- Computer ScienceInf. Storage Retr.
- 1973
Automatic Document Classification Part II . Additional Experiments
- Computer Science, Environmental ScienceJACM
- 1964
It is concluded that, while there is no significant difference in the predictive efficiency between the Bayesian and the Factor Score methods, automatic document classification is enhanced by the use of a factor-analytically-derived classification schedule.
Is Automatic Classification a Reasonable Application of Statistical Analysis of Text?
- Computer ScienceJACM
- 1965
The crucial question of the quality of automatic classification is treated at considerable length, and empirical data are introduced to support the hypothesis that classification quality improves as more information about each document is used for input to the classification program.
Rapid pre-indexing by machine.
- Computer Science
- 1968
A new method of subject indexing by machine for documents in the Project INTREX catalog is developed to allow new documents to be placed online quickly in the computer-stored Intres catalog.
References
On Relevance, Probabilistic Indexing and Information Retrieval
- Computer ScienceJACM
- 1960
The paper suggests an interpretation of the whole library problem as one where the request is considered as a clue on the basis of which the library system makes a concatenated statistical inference in order to provide as an output an ordered list of those documents which most probably satisfy the information needs of the user.