Spam Detection using Clustering , Random Forests , and Active Learning

@inproceedings{DeBarr2009SpamDU,
  title={Spam Detection using Clustering , Random Forests , and Active Learning},
  author={Dave DeBarr and G. Mason and Harry Wechsler},
  year={2009}
}
This paper describes work in progress. Our research is focused on efficient construction of effective models for spam detection. Clustering messages allows for efficient labeling of a representative sample of messages for learning a spam detection model using a Random Forest for classification and active learning for refining the classification model. Results are illustrated for the 2007 TREC Public Spam Corpus. The area under the Receiver Operating Characteristic (ROC) curve is competitive… CONTINUE READING
Highly Cited
This paper has 52 citations. REVIEW CITATIONS
26 Citations
8 References
Similar Papers

Citations

Publications citing this paper.
Showing 1-10 of 26 extracted citations

53 Citations

01020'11'13'15'17
Citations per Year
Semantic Scholar estimates that this publication has 53 citations based on the available data.

See our FAQ for additional information.

References

Publications referenced by this paper.
Showing 1-8 of 8 references

Spam Track Overview", NIST Special Publication 500-274

  • G. V. Cormack
  • "TREC
  • 2007
1 Excerpt

Partitioning Around Medoids

  • L. Kaufman, P. J. Rousseeuw
  • Finding Groups in Data, Wiley-Interscience,
  • 2005
1 Excerpt

Similar Papers

Loading similar papers…