# A Bayesian Approach to Filtering Junk E-Mail

@inproceedings{Sahami1998ABA, title={A Bayesian Approach to Filtering Junk E-Mail}, author={Mehran Sahami and Susan T. Dumais and David Heckerman and Eric Horvitz}, booktitle={AAAI Conference on Artificial Intelligence}, year={1998} }

In addressing the growing problem of junk E-mail on the Internet, we examine methods for the automated construction of filters to eliminate such unwanted messages from a user’s mail stream. [] Key Result Finally, we show the efficacy of such filters in a real world usage scenario, arguing that this technology is mature enough for deployment.

## 1,630 Citations

### IMPROVING SPAM EMAIL FILTERING EFFICIENCY USING BAYESIAN BACKWARD APPROACH PROJECT

- Computer Science
- 2020

A new approach based on Bayesian classification that can automatically classify e-mail messages as spam or legitimate is explored and its performance for various datasets is studied.

### Filtering Junk Mail with a Maximum Entropy Model

- Computer Science
- 2003

This work presents a hybrid approach, utilizing a Maximum Entropy Model, and shows how to use it in a junk mail filtering task and presents an extensive experimental comparison of this approach with a Naive Bayes classifier, a widely used classifier in e-mail filtering task, and show that this approach performs comparable or better than Naives Bayes method.

### Using Naïve Bayes Method to Classify Text-Based Email

- Computer Science, Materials Science2018 9th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)
- 2018

Results are presented from a number of experiments and show that a filtering system, BETSY, could become a useful and valuable part of any e-mail client.

### Automatic junk e-mail filtering based on latent content

- Computer Science2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721)
- 2003

Experiments show that the underlying framework is latent semantic analysis, which is competitive with the state-of-the-art in e-mail classification, and potentially advantageous in real-world applications with high junk-to-legitimate ratios.

### A Neural Network Classifier for Junk E-Mail

- Computer ScienceDocument Analysis Systems
- 2004

This preliminary study tests this alternative approach using a neural network (NN) classifier on a corpus of e-mail messages from one user, and it appears that commercial spam detectors are now beginning to use descriptive features as proposed here.

### Learning to Filter Unsolicited Commercial E-Mail

- Computer Science
- 2006

The architecture of a fully implemented learning-based anti-spam filter is described, and an analysis of its behavior in real use over a period of seven months is presented.

### Ways to Evade Spai Filters and Machine Learning as a Potential Solution

- Computer Science2006 International Symposium on Communications and Information Technologies
- 2006

A critical analysis of the various ways adopted by spammers to dodge the spam filters is presented, and the Bayesian noise reduction (BNR) technique is explored which attempts to solve this problem by identifying and eliminating the 'out of context' data (so injected by spams or otherwise) to provide a cleaner classification.

### A MATHEMATICAL APPROACH FOR FILTERING JUNK E-MAIL USING RELEVANCE ANALYSIS

- Computer Science
- 2016

This paper presents a mathematical approach to restrict the spam e-mails through subject and content relevancy of the e-mail, and results of this approach are used to classify the E-mail to be spam.

### An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages

- Computer ScienceSIGIR '00
- 2000

This work introduces appropriate cost-sensitive measures, and investigates at the same time the effect of attribute-set size, training-corpus size, lemmatization, and stop lists, issues that have not been explored in previous experiments.

### Spam / Junk E-Mail Filter Technique

- Computer Science
- 2016

The preliminary study tests an alternative approach using a neural network (NN) classifier to overcome drawbacks of Naïve Bayesian approach and uses a feature set that uses descriptive characteristics of words and messages similar in the way that users would use to identify spam.

## References

SHOWING 1-10 OF 20 REFERENCES

### Learning Rules that Classify E-Mail

- Computer Science
- 1996

Two methods for learning text classifiers are compared on classification problems that might arise in filtering and filing personM e-mail messages: a "traxiitionM IR" method based on TF-IDF…

### Learning Limited Dependence Bayesian Classifiers

- Computer ScienceKDD
- 1996

A framework for characterizing Bayesian classification methods is presented and a general induction algorithm is presented that allows for traversal of this spectrum depending on the available computational power for carrying out induction and its application in a number of domains with different properties.

### Improving Text Classification by Shrinkage in a Hierarchy of Classes

- Computer ScienceICML
- 1998

This paper shows that the accuracy of a naive Bayes text classi er can be improved by taking advantage of a hierarchy of classes, and adopts an established statistical technique called shrinkage that smoothes parameter estimates of a data-sparse child with its parent in order to obtain more robust parameter estimates.

### Toward Optimal Feature Selection

- Computer ScienceICML
- 1996

An efficient algorithm for feature selection which computes an approximation to the optimal feature selection criterion is given, showing that the algorithm effectively handles datasets with a very large number of features.

### A comparison of two learning algorithms for text categorization

- Computer Science
- 1994

It is shown that both algorithms achieve reasonable performance and allow controlled tradeoos between false positives and false negatives, and the stepwise feature selection in the decision tree algorithm is particularly eeective in dealing with the large feature sets common in text categorization.

### Machine learning

- Computer ScienceCSUR
- 1996

Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

### Probabilistic reasoning in intelligent systems - networks of plausible inference

- Computer ScienceMorgan Kaufmann series in representation and reasoning
- 1989

The author provides a coherent explication of probability as a language for reasoning with partial belief and offers a unifying perspective on other AI approaches to uncertainty, such as the Dempster-Shafer formalism, truth maintenance systems, and nonmonotonic logic.

### Text Categorization with Support Vector Machines: Learning with Many Relevant Features

- Computer ScienceECML
- 1998

This paper explores the use of Support Vector Machines (SVMs) for learning text classifiers from examples. It analyzes the particular properties of learning with text data and identifies why SVMs are…

### Elements of Information Theory

- Computer Science
- 1991

The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.

### The Nature of Statistical Learning Theory

- Computer ScienceStatistics for Engineering and Information Science
- 2000

Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing…