Multi-Paragraph Segmentation of Expository Text

@article{Hearst1994MultiParagraphSO,
  title={Multi-Paragraph Segmentation of Expository Text},
  author={Marti A. Hearst},
  journal={ArXiv},
  year={1994},
  volume={abs/cmp-lg/9406037}
}
This paper describes TextTiling, an algorithm for partitioning expository texts into coherent multi-paragraph discourse units which reflect the subtopic structure of the texts. The algorithm uses domain-independent lexical frequency and distribution information to recognize the interactions of multiple simultaneous themes. Two fully-implemented versions of the algorithm are described and shown to produce segmentation that corresponds well to human judgments of the major subtopic boundaries of… Expand

Figures, Tables, and Topics from this paper

Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages
TLDR
The algorithm is fully implemented and is shown to produce segmentation that corresponds well to human judgments of the subtopic boundaries of 12 texts, which should be useful for many text analysis tasks, including information retrieval and summarization. Expand
Text Segmentation with Multiple Surface Linguistic Cues
TLDR
This paper describes a method for identifying segment boundaries of a Japanese text with the aid of multiple surface linguistic cues, and presents a method of training the weights for multiple linguistic cues automatically without the overfitting problem. Expand
Broad coverage paragraph segmentation across languages and domains
TLDR
This article presents a paragraph segmentation model which exploits a variety of knowledge sources (including textual cues, syntactic and discourse-related information) and evaluates its performance in different languages and domains and shows that it is useful for structuring the output of automatically generated text. Expand
Text Segmentation into Paragraphs Based on Local Text Cohesion
TLDR
This paper proposes a method of quantitative evaluation of text cohesion based on a large linguistic resource - a collocation network and compares word occurrences in a text against a large DB of collocations and semantic links between words in the given natural language. Expand
Topic Segmentation Algorithms for Text Summarization and Passage Retrieval: An Exhaustive Evaluation
TLDR
A context-based topic segmentation system based on a new informative similarity measure based on word co-occurrence to solve problems of reliability of systems based on lexical repetition and problems of adaptability of language-dependent systems. Expand
Text Segmentation Using Reiteration and Collocation
TLDR
This method is shown to successfully detect known subject changes in text and corresponds well to the segmentations placed by test subjects. Expand
Segmentation of Expository Texts by Hierarchical Agglomerative Clustering
TLDR
The method uses paragraphs as the basic segments for identifying hierarchical discourse structure in the text, applying lexical similarity between them as the proximity test. Expand
Tracker Text Segmentation Approach: Integrating Complex Lexical and Conversation Cue Features
TLDR
An algorithm suited for transcribed meeting conversations combining semantically complex lexical relations with conversational cue phrases to build lexical chains in determining topic boundaries is described. Expand
Feature-Based Segmentation of Narrative Documents
TLDR
A feature-based method that combines features from diverse sources as well as learned features is presented that shows results that are significantly better than previous segmentation approaches for narrative text. Expand
Text segmentation of spoken meeting transcripts
TLDR
An algorithm suitable for segmenting spoken meeting transcripts combining semantically complex lexical relations with speech cue phrases to build lexical chains in determining topic boundaries is described. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 53 REFERENCES
Text tiling: A quantitative approach to discourse segmentation
TLDR
TextTiling, a method for partitioning full-length text documents into coherent multiparagraph units, is presented, finding the tiles have been found to correspond well to human judgements of themajor subtopicboundaries of science magazine articles. Expand
TextTiling: A Quantitative Approach to Discourse
TLDR
TextTiling, a method for partitioning full-length text documents into coherent multiparagraph units, has been found to correspond will to human judgements of the major subtopic boundaries of science magazine articles. Expand
Text Segmentation Based on Similarity between Words
TLDR
Comparison with the text segments marked by a number of subjects shows that LCP closely correlates with the human judgments, which may provide valuable information for resolving anaphora and ellipsis. Expand
Context and structure in automated full-text information access
TLDR
A graphical interface is described, called Cougar, that displays retrieved documents in terms of interactions among their automatically-assigned main topics, thus allowing users to familiarize themselves with the topics and terminology of a text collection. Expand
What do paragraph markings do
This report investigates the role of paragraph markings in text: how informative a paragraph cue is and how paragraph cues affect interpretation. In the first study, subjects were shown unparagraphedExpand
Intention-Based Segmentation: Human Reliability and Correlation with Linguistic Cues
TLDR
A two part study evaluates the statistical reliability of human segmentation of a corpus of spontaneous, narrative monologues, where speaker intention is the segmentation criterion. Expand
Lexical Cohesion Computed by Thesaural Relations as an Indicator of the Structure of Text
TLDR
Since the lexical chains are computable, and exist in non-domain-specific text, they provide a valuable indicator of text structure, and provide a semantic context for interpreting words, concepts, and sentences. Expand
Approaches to passage retrieval in full text information systems
TLDR
New approaches are described in this study for implementing selective passage retrieval systems, and identifying text passages responsive to particular user needs. Expand
Introduction to WordNet: An On-line Lexical Database
Standard alphabetical procedures for organizing lexical information put together words that are spelled alike and scatter words with similar or related meanings haphazardly through the list.Expand
RHETORICAL STRUCTURE THEORY: A THEORY OF TEXT ORGANIZATION
TLDR
This paper establishes a new definitional foundation for RST, Definitions are made more systematic and explicit, they introduce a new functional element, and incidentally reflect more experience in text analysis. Expand
...
1
2
3
4
5
...