Certainty Identification in Texts: Categorization Model and Manual Tagging Results
In this paper, we will focus on the development, implementation, and evolution of a discourse model which is used to computationally instantiate a discourse structure in individual texts. This discourse model was developed for use in a Text Structuring module that recognizes discourse-level structure within a large-scale information retrieval system, DR-LINK (Liddy Myaeng, 1993). The Text Structurer produces an enriched representation of each document by computationally decomposing it into smaller, conceptually labelled components. This delineation of the discourse-level organization of each document’s contents facilitates retrieval of those documents which convey the appropriate discourse semantics that are responsive to the user’s query. The recognition of the existence of text-type models derives from research in discourse linguistics which has shown that writers who repeatedly produce texts of a particular type are influenced by the schema of that texttype and, when writing, consider not only the specific content they wish to convey but also what the usual structure is for that type of text based on the purpose it is intended to serve (Jones, 1983). As a result, texts of particular type evidence the schema that exists in the minds of those who produce the texts. These schema can be delineated, and as such provide models of their respective text-types which are of use in automatically structuring texts. A text schema explicates a discernible, predictable structure, the global schematic structure that is filled with different meaning in each particular example of that text-type (van Dijk, 1980). Among the text-types for which schemas or models have been developed are: folk-tales (Propp, 1958), newspaper articles (van Dijk, 1980), arguments (Cohen, 1987), historical journal articles (Tibbo, 1989), editorials (Alvarado, 1990), empirical abstracts (Liddy, 1991), theoretical abstracts (Francis & Liddy, 1991).