Spam Detection using Clustering , Random Forests , and Active Learning

  title={Spam Detection using Clustering , Random Forests , and Active Learning},
  author={Dave DeBarr and G. Mason and Harry Wechsler},
This paper describes work in progress. Our research is focused on efficient construction of effective models for spam detection. Clustering messages allows for efficient labeling of a representative sample of messages for learning a spam detection model using a Random Forest for classification and active learning for refining the classification model. Results are illustrated for the 2007 TREC Public Spam Corpus. The area under the Receiver Operating Characteristic (ROC) curve is competitive… CONTINUE READING
Highly Cited
This paper has 52 citations. REVIEW CITATIONS
26 Citations
8 References
Similar Papers


Publications citing this paper.
Showing 1-10 of 26 extracted citations

53 Citations

Citations per Year
Semantic Scholar estimates that this publication has 53 citations based on the available data.

See our FAQ for additional information.


Publications referenced by this paper.
Showing 1-8 of 8 references

Spam Track Overview", NIST Special Publication 500-274

  • G. V. Cormack
  • "TREC
  • 2007
1 Excerpt

Partitioning Around Medoids

  • L. Kaufman, P. J. Rousseeuw
  • Finding Groups in Data, Wiley-Interscience,
  • 2005
1 Excerpt

Similar Papers

Loading similar papers…