The construction of an empirically based mathematically derived classification system

This study describes a method for developing an empirically based, computer derived classification system. 618 psychological abstracts were coded in machine language for computer processing. The total text consisted of approximately 50,000 words of which nearly 6,800 were unique words. The computer program arranged these words in order of frequency of occurrence. From the list of words which occurred 20 or more times, excluding syntactical terms, such as, and, but, of, etc., the investigator… 
Automatic Document Classification
Of the ninety documents in the Validation Group which contained two or more clue words, and which could be automatically classified, 44 documents, or 48.9%, were placed into their correct categories by use of a computer formula.
A Factor Analytically Derived Classification System for Psychological Reports
It is shown that it is possible to determine the basic dimensions of a collection of documents by an analysis of the words used in their abstracts, and the resulting matrix was factor analyzed.
Automatic Document Classification Part II . Additional Experiments
It is concluded that, while there is no significant difference in the predictive efficiency between the Bayesian and the Factor Score methods, automatic document classification is enhanced by the use of a factor-analytically-derived classification schedule.
An Analysis of Some Graph Theoretical Cluster Techniques
Several graph theoretic cluster techniques aimed at the automatic generation of thesauri for information retrieval systems are explored and two algorithms have been tested that find maximal complete subgraphs.
A random walk on an ontology: Using thesaurus structure for automatic subject indexing
The results of the analysis support the hypothesis that subject indexing is in part a browsing process, and that using the vocabulary and its structure in a thesaurus contributes to the indexing process.
Indexing and abstracting by association
This article discusses the possibility of exploiting the statistics of word co-occurrence in text for purposes of document retrieval. Co-occurrence is defined and related to the mental processes of
Mathematical analysis of documentation systems : An attempt to a theory of classification and search request formulation
A generalized definition of superimposed coding; some functions for the distance of objects or attributes; optimization and automatic derivation of classifications.
Recent Studies in Automatic Text Analysis and Document Retrieval
An attempt is made to identify those automatic procedures which appear most effective as a replacement for the missing language analysis procedures, and it is shown that the fully automatic methodology is superior in effectiveness to the conventional procedures in normal use.
Use of Computers in Educational Research
SINCE Martin and Hall wrote their chapter concerning data processing for the December 1960 REVIEW, the computer field has been kaleidoscopic, with new developments appearing and older trends growing
All-automatic processing for a large library
Our concept of what is considered large-library-processing changes with the growth of published information and with the progress of the relevant data processing technology. The size of the library


Modern Factor Analysis
This thoroughly revised third edition of Harry H. Harman's authoritative text incorporates the many new advances made in computer science and technology over the last ten years. The author gives full
  "Probabilistic Indexing,"
  1959
Probabilistic Indexing
FEAT, An Inventory Program for Information Retrieval JTN-4018, System Development Corporation
