Using Compression For Source Based Classification Of Text

  title={Using Compression For Source Based Classification Of Text},
  author={Nitin Thaper and Shafi Goldwasser},
This thesis addresses the problem of source based text classification. In a nutshell, this problem involves classifying documents according to “where they came from” instead of the usual “what they contain”. Viewed from a machine learning perspective, this can be looked upon as a learning problem and can be classified into two categories: supervised and unsupervised learning. In the former case, the classifier is presented with known examples of documents and their sources during the training… CONTINUE READING