Judging whether a document changes in subject

Abstract

This paper describes a method for determining whether a document is composed of text related to a single subject or text that changes subjects. The algorithm involves dividing the document into five equal parts and measuring the text similarity of the different sections with one another. Documents that drift in subject are shown to have a higher standard… (More)

Topics

4 Figures and Tables