Hierarchical Document Clustering Based on SAT Model

A word in a document can be viewed as an item, and hence a group of words is an itemset. While previous association based clustering works all view a whole document as a single transaction for mining frequent itemsets and association rules, the basic semantic unit in a document is actually a sentence. Words co-occurring in one and the same sentence are usually associated in one way or the other, and are more meaningful than the same group of words spanning several sentences, in the document… CONTINUE READING

