Compression and machine learning: a new perspective on feature space vectors

@article{Sculley2006CompressionAM,
  title={Compression and machine learning: a new perspective on feature space vectors},
  author={D. Sculley and Carla E. Brodley},
  journal={Data Compression Conference (DCC'06)},
  year={2006},
  pages={332-341}
}
The use of compression algorithms in machine learning tasks such as clustering and classification has appeared in a variety of fields, sometimes with the promise of reducing problems of explicit feature selection. The theoretical justification for such methods has been founded on an upper bound on Kolmogorov complexity and an idealized information space. An alternate view shows compression algorithms implicitly map strings into implicit feature space vectors, and compression-based similarity… CONTINUE READING

Citations

Publications citing this paper.
SHOWING 1-10 OF 69 CITATIONS

Verification based on Compression-Models

VIEW 7 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

Authorship Verification based on Compression-Models

VIEW 7 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

Email Spam Filtering

VIEW 4 EXCERPTS
CITES RESULTS, METHODS & BACKGROUND
HIGHLY INFLUENCED

FILTER CITATIONS BY YEAR

2006
2019

CITATION STATISTICS

  • 12 Highly Influenced Citations

References

Publications referenced by this paper.
SHOWING 1-10 OF 23 REFERENCES

Clustering by compression

  • IEEE International Symposium on Information Theory, 2003. Proceedings.
  • 2003
VIEW 4 EXCERPTS
HIGHLY INFLUENTIAL

The similarity metric

  • IEEE Transactions on Information Theory
  • 2001
VIEW 12 EXCERPTS
HIGHLY INFLUENTIAL

Text categorization using compression models

  • Proceedings DCC 2000. Data Compression Conference
  • 2000
VIEW 8 EXCERPTS
HIGHLY INFLUENTIAL

Spam Filtering Using Compression Models

VIEW 4 EXCERPTS
HIGHLY INFLUENTIAL

Shared information and program plagiarism detection

  • IEEE Transactions on Information Theory
  • 2004
VIEW 7 EXCERPTS
HIGHLY INFLUENTIAL

Language trees and zipping.

VIEW 4 EXCERPTS
HIGHLY INFLUENTIAL