Reader - Speci c Text Document Classi cation

  • Heikki Hyy
  • Published 1998

Abstract

An approach to automatic modeling of text documents is presented, where `semantic features' based on contextual dependencies are extracted from the textual data. The model structure has two levels; rst, context categories are constructed using sentences in the documents as elementary contextual units, and, second, document categories are constructed using the lower-level document analysis results as input data. Models on both of these levels are based on a feature extraction scheme, where the features can be interpreted as coordinate axes in the linear high-dimensional space. The models are adaptive, being updated according to what kind of documents have been read, so that the user-speci c `pro le' helps to nd relevant documents that match the user's personal model. An implementation of this approach is presented, technical details are discussed, and some results when using the program are reviewed.

5 Figures and Tables

Cite this paper

@inproceedings{Hyy1998ReaderS, title={Reader - Speci c Text Document Classi cation}, author={Heikki Hyy}, year={1998} }