Text Classification Using Compression-Based Dissimilarity Measures

Arguably, the most di±cult task in text classi ̄cation is to choose an appropriate set of features that allows machine learning algorithms to provide accurate classi ̄cation. Most state-of-the-art techniques for this task involve careful feature engineering and a pre-processing stage, which may be too expensive in the emerging context of massive collections… CONTINUE READING

