Automatic text structuring and retrieval-experiments in automatic encyclopedia searching

  title={Automatic text structuring and retrieval-experiments in automatic encyclopedia searching},
  author={Gerard Salton and Chris Buckley},
  booktitle={SIGIR '91},
Many conventional approaches to text analysis and information retrieval prove ineffective when large text collections must be processed in heterogeneous subject areas. An alternative text manipulation system is outlined useful for the retrieval of large heterogeneous texts, and for the recognition of content similarities between text excerpts, based on flexible text matching procedures carried out in several contexts of different scope. The methods are illustrated by search experiments… 

Tables from this paper

Automatic structuring of text files 1
Methods are described in this study for the automatic structuring of heterogeneous text collections, and the construction of browsing tools and access procedures that facilitate collection use.
Automatic structuring and retrieval of large text files
An alternative approach is introduced which uses the document collections themselves as a basis for the text analysis, together with sophisticated text matching operations carried out at several levels of detail.
Selective text utilization and text traversal
Global text comparison methods are used to identify similarities between text elements, followed by local context-checking operations that resolve ambiguities and distinguish superficially similar texts from texts that actually cover identical topics.
Approaches to passage retrieval in full text information systems
New approaches are described in this study for implementing selective passage retrieval systems, and identifying text passages responsive to particular user needs.
Automatic Structuring of Text Files
Methods are described in this study for the automatic structuring of heterogeneous text collections, and the construction of browsing tools and access procedures that facilitate collection use.
Context and structure in automated full-text information access
A graphical interface is described, called Cougar, that displays retrieved documents in terms of interactions among their automatically-assigned main topics, thus allowing users to familiarize themselves with the topics and terminology of a text collection.
Retrieval of passages for information reduction
By locating passages for display to the user, this research winnows a text down to sets of several sentences, greatly reducing the time and effort expended searching through each text for important features.
Automatic Keyword Extraction for Text Summarization in Multi-document e-Newspapers Articles
This paper proposed a hybrid approach to extract keyword automatically for multi-document text summarization in enewspaper articles and showed that the proposed techniques had been outperformed over other techniques for automatic keyword extraction and summarization.
Enhancing Information Retrieval Through Statistical Natural Language Processing: A Study of Collocation Indexing
Preliminary evidence is provided for the usefulness of statistical natural language processing (NLP) techniques, and specifically of collocation indexing, for IR in general settings, and the effect of three key parameters on collocations indexing performance: directionality, distance, and weighting is investigated.
Subtopic structuring for full-length document access
It is argued that the advent of large volumes of full-length text, as opposed to short texts like abstracts and newswire, should be accompanied by corresponding new approaches to information access and a partition of the text into coherent multi-paragraph units that represent the pattern of subtopics that comprise the text.


The effectiveness of a nonsyntatic approach to automatic phrase indexing for document retrieval
It is not likely that phrase indexing of this kind will prove to be an important method of enhancing the performance of automatic document indexing and retrieval systems in operational environments, and a general syntactic analysis facility may be required.
Term-Weighting Approaches in Automatic Text Retrieval
Information Retrieval
  • A. Dekhtyar
  • Computer Science
    Lecture Notes in Computer Science
  • 1968
A novel method to efficiently represent the behaviors of query reformulation by the translating embedding from the original query to its reformulated query by utilizing two-stage training algorithm to make the learning of multilevel intentions representation more adequate.
Hypertext: An Introduction and Survey
A survey of existing hypertext systems, their applications, and their design is both an introduction to the world of hypertext and a survey of some of the most important design issues that go into fashioning a hypertext environment.
Improving retrieval performance by relevance feedback
Prescriptions are given for conducting text retrieval operations iteratively using relevance feedback, and evaluation data are included to demonstrate the effectiveness of the various methods.
Language and Representation in Information Retrieval
  • D. Blair
  • Computer Science, Psychology
  • 1990
This work has shown that language and Representation are the central problem in Information Retrieval and the nature of scientific theory, and the principal formal models used in information retrieval are language and representation.
Associative Networks- Representation and Use of Knowledge by Computers
This book provides good coverage of semantic networks and related systems for representing knowledge and should be commended for the editor's efforts in putting together a wellorganized book.
Introduction to Modern Information Retrieval
Reading is a need and a hobby at once and this condition is the on that will make you feel that you must read.
Searching for information in a hypertext medical handbook
  • E. Mark
  • Computer Science, Medicine
  • 1988
Implementing a popular medical handbook in hypertext underscores the need to study hypertext in the context of full-text document retrieval, machine learning, and user interface issues.