Some experiments in the generation of word and document associations

  title={Some experiments in the generation of word and document associations},
  author={Gerard Salton},
  booktitle={AFIPS '62 (Fall)},
  • G. Salton
  • Published in AFIPS '62 (Fall) 30 December 1899
  • Computer Science
The solution of most problems in automatic information dissemination and retrieval is dependent on the availability of methods for the automatic analysis of information content. It is, in fact, impossible to identify, classify, encode, and organize items of information, or requests for information, without first determining the content or subject matter of the information to be processed. In most proposed automatic systems, this analysis is based on a counting procedure which uses the frequency… 

Some hierarchical models for automatic document retrieval

An attempt is made in the present study to overcome the limitations of the strictly quantitative methods by presenting two systems for automatic document retrieval which are based on hierarchical storage arrangements as well as on the usual frequency counts and association measures.

A document retrieval system for man-machine interaction

  • G. Salton
  • Computer Science
    ACM National Conference
  • 1964
An automatic document retrieval system, programmed for the IBM 7094, is described. The system is designed to process English texts and search requests, and uses statistical, syntactic and semantic

Associative Document Retrieval Techniques Using Bibliographic Information

It is suggested in this study that bibliographic citations may provide a simple means for obtaining associated documents to be incorporated in an automatic documentation system.

The computer-stored thesaurus and its use in concept processing

The most crucial point in the documentation process---the one which may contribute more than any other to its over-all success or failure---is that of indexing the documents prior to their entry into the search files.

Analog Networks for Word Association

  • V. Giuliano
  • Computer Science
    IEEE Transactions on Military Electronics
  • 1963
This paper is concerned with the use of analog electrical networks for the automatic recognition of statistical word associations present in written English text and it is shown that this theory can be realized through use of passive electrical networks.

References and citations in automatic indexing and retrieval systems - experiments with the boomerang effect

Analytical and empirical investigations are carried out with the aim of investigating which factors that affect the behaviour and performance of automatic indexing and retrieval techniques given that references and citations are an integrated part of the document representation of scientific full text documents in the IR system.

Mechanized Indexing Methods and Their Testing

Methods of mechanized indexing (subject indexing by computer) which have been proposed are systematically summarized and a comprehensive document preparation is described from which proposed methods can be derived by selection.

Information storage and retrieval-analysis of the state of the art

It is reasonable to expect overemphasis on equipment capabilities in the evolution of the concept and equipment aspects of information storage and retrieval (IS&R).

Viii-1 Viii. Bibliographic Data as an Aid to Document Retrieval

It is found that two uncommon kinds of bibliographic data, authors and place of publication, are used to build concept matrices and with the aid of a new statistic, they actually do aid retrieval.

Text Mining for Type of Research Classification

Abstract This project brought together undergraduate students in Computer Science with librarians to mine abstracts of articles from the Texas A&M University Libraries’ institutional repository,




An attempt is made in particular to determine those areas in a natural language text which contain more than an average amount of new information.

Indexing and abstracting by association

This article discusses the possibility of exploiting the statistics of word co-occurrence in text for purposes of document retrieval. Co-occurrence is defined and related to the mental processes of

The Automatic Creation of Literature Abstracts

In the exploratory research described, the complete text of an article in machine-readable form is scanned by an IBM 704 data-processing machine and analyzed in accordance with a standard program.

Grouping and Dependency Theories

The two common methods of describing sentence structure (at the syntactic level) are immediate-constituent analysis and dependency analysis. The former, also known as phrase-structure analysis, is


This report supersedes previous reports on the experimental predictive syntactic analysis program for Russian and all the grammatical rules followed by the experimental program are here included.

A Computational Approach to Grammatical Coding of English Words

A computational grammar coder which has been completely programmed and is oper~tional on the IBM 7090 is described, part of a complete syntactic annlysis system for which it accomplishes word-class coding, using a computational approach rather than the usual method of dictionary lookup.

The construction of an empirically based mathematically derived classification system

  • H. Borko
  • Psychology
    AIEE-IRE '62 (Spring)
  • 1962
The results demonstrate the feasibility of an empirically derived classification system and establish the value of factor analysis as a technique in language data processing.

Baseball: an automatic question-answerer

<u>Baseball</u> is a computer program that answers questions phrased in ordinary English about stored data. The program reads the question from punched cards. After the words and idioms are looked up

Manipulation of trees in information retrieval

/.} ,I n t r o d u c t i o n All inf~n'm:t{i~)l,., ret, ri( 'val system is (tesigned t() provide "tli:-tvo!'s It) l'('(llIOSld, I'()F i i l f ( ) l ' l l la i io t l . I l l s ( ) n l e S VS{OInS~

The Theory of Nets

This paper presents the general concept of a weighted directed graph which is called a net, and a non-arithmetic matrix calculus is developed to facilitate computations and formalize proofs.