A Critique and Improvement of an Evaluation Metric for Text Segmentation

@article{Pevzner2002ACA,
  title={A Critique and Improvement of an Evaluation Metric for Text Segmentation},
  author={L. Pevzner and Marti A. Hearst},
  journal={Computational Linguistics},
  year={2002},
  volume={28},
  pages={19-36}
}
The Pk evaluation metric, initially proposed by Beeferman, Berger, and Lafferty (1997), is becoming the standard measure for assessing text segmentation algorithms. However, a theoretical analysis of the metric finds several problems: the metric penalizes false negatives more heavily than false positives, overpenalizes near misses, and is affected by variation in segment size distribution. We propose a simple modification to the Pk metric that remedies these problems. This new metriccalled… Expand

Topics from this paper

On Evaluation Methodologies for Text Segmentation Algorithms
TLDR
The production of a segmentation of reference being a rather difficult task, this paper describes a new evaluation metric that relies on the stability of the segmentations face to text transformations and two proposed metrics provide really better indicators of the text segmentation accuracy than existing measures. Expand
On Evaluation Methodologies for Text Segmentation Algorithms
TLDR
The production of a segmentation of reference being a rather difficult task, this paper describes a new evaluation metric that relies on the stability of the segmentations face to text transformations and two proposed metrics provide really better indicators of the text segmentation accuracy than existing measures. Expand
Evaluating Text Segmentation using Boundary Edit Distance
This work proposes a new segmentation evaluation metric, named boundary similarity (B), an inter-coder agreement coefficient adaptation, and a confusion-matrix for segmentation that are all basedExpand
Unbiased discourse segmentation evaluation
TLDR
This paper shows that the performance measures Pk and Window Diff are biased in favor of segmentations with fewer or adjacent segment boundaries, and proposes a novel unbiased measure k-κ, providing a single score that accounts for chance agreement. Expand
An Improved Model of Dotplotting for Text Segmentation
TLDR
Comparative experimental results on the synthetic corpus and real corpus have shown that MMD model reduces the error rate of the original Dotplotting method by more than 20 percent, and outperforms other existing methods derived from Dot Plotting. Expand
An Analysis of Quantitative Aspects in the Evaluation of Thematic Segmentation Algorithms
TLDR
It is shown that evaluation on synthetic data is potentially misleading and fails to give an accurate evaluation of the performance on real data, and a critical review of existing evaluation metrics in the literature and an improved evaluation metric are provided. Expand
Topical Segmentation: a Study of Human Performance and a New Measure of Quality.
In a large-scale study of how people find topical shifts in written text, 27 annotators were asked to mark topically continuous segments in 20 chapters of a novel. We analyze the resulting corpus forExpand
Getting More from Segmentation Evaluation
TLDR
A new segmentation evaluation measure, WinPR, is introduced, which resolves some of the limitations of WindowDiff and produces more intuitive measures, such as precision, recall, and F-measure. Expand
Evaluating Text Segmentation
TLDR
It is asserted that one segmentation of a text cannot constitute a “true” segmentation and that an adapted inter-coder agreement statistics proposed herein should be used to determine the reproducibility and reliability of a coding scheme and set of manual codings. Expand
Intended boundaries detection in topic change tracking for text segmentation
TLDR
A topical text segmentation method based on intended boundaries detection and compares it to a well known default boundaries detection method, c99 showed that algorithms that are close when automatically evaluated can be quite far when manually evaluated. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 48 REFERENCES
Statistical Models for Text Segmentation
TLDR
Assessment of the approach on quantitative and qualitative grounds demonstrates its effectiveness in two very different domains, Wall Street Journal news articles and television broadcast news story transcripts, using a new probabilistically motivated error metric. Expand
Text Segmentation Using Exponential Models
TLDR
This work enlists both short-range and long-range language models to help it sniff out likely sites of topic changes in text, and proposes a new probabilistically motivated error metric for use by the natural language processing and information retrieval communities. Expand
Discourse segmentation in aid of document summarization
  • B. Boguraev, Mary S. Neff
  • Computer Science
  • Proceedings of the 33rd Annual Hawaii International Conference on System Sciences
  • 2000
TLDR
Evaluated against the corpus used in the development of the baseline summarizer, summaries derived either by means of segmentation analysis alone, or by a mix of strategies for combining salience calculation and topic shift detection, are shown to be of comparable, and under certain conditions even better quality. Expand
Selecting Text Spans for Document Summaries: Heuristics and Metrics
TLDR
An analysis of news-article summaries generated by sentence extraction using a large corpus of extraction-based summaries to characterize the underlying degree of difficulty of summarization at different compression levels on articles in this corpus. Expand
A cluster-based approach to tracking, detection and segmentation of broadcast news
TLDR
The tracking, detection and segmentation modules provide a sound framework for future extension and experimentation and the effect of reducing the training size of relevant stories is examined. Expand
Optimal Multi-Paragraph Text Segmentation by Dynamic Programming
TLDR
A fragmentation method based on dynamic programming is proposed that is theoretically sound and guaranteed to provide an optimal splitting on the basis of a similarity curve, a preferred fragment length, and a cost function defined. Expand
Passage retrieval revisited
TLDR
This paper compares their scheme of arbitrary passage retrieval to several other document retrieval and passage retrieval methods and shows experimentally that, compared to these methods,ranking via fixed-length passages is robust and effective. Expand
Text tiling: A quantitative approach to discourse segmentation
TLDR
TextTiling, a method for partitioning full-length text documents into coherent multiparagraph units, is presented, finding the tiles have been found to correspond well to human judgements of themajor subtopicboundaries of science magazine articles. Expand
Linear Segmentation and Segment Significance
TLDR
A new method for discovering a segmental discourse structure of a document while categorizing each segment's function and importance is presented, using a zero-sum weighting scheme. Expand
TextTiling: A Quantitative Approach to Discourse
TLDR
TextTiling, a method for partitioning full-length text documents into coherent multiparagraph units, has been found to correspond will to human judgements of the major subtopic boundaries of science magazine articles. Expand
...
1
2
3
4
5
...