Behavior Profiling of Email

  title={Behavior Profiling of Email},
  author={S. Stolfo and Shlomo Hershkop and Ke Wang and Olivier Nimeskern and Chia-Wei Hu},
This paper describes the forensic and intelligence analysis capabilities of the Email Mining Toolkit (EMT) under development at the Columbia Intrusion Detection (IDS) Lab. EMT provides the means of loading, parsing and analyzing email logs, including content, in a wide range of formats. Many tools and techniques have been available from the fields of Information Retrieval (IR) and Natural Language Processing (NLP) for analyzing documents of various sorts, including emails. EMT, however, extends… 

Behavior-based modeling and its application to Email analysis

It is shown by way of simulation that virus propagations are detectable since viruses may emit emails at rates different than human behavior suggests is normal, and email is directed to groups of recipients in ways that violate the users' typical communications with their social groups.

Email mining toolkit supporting law enforcement forensic analyses

This project focuses on providing support to detectives and analysts in law enforcement to develop powerful means of analyzing emails acquired under due process as evidence in various investigations.

Behavior-based email analysis with application to spam detection

The Email Mining Toolkit is a data mining toolkit designed to analyze offline email corpora, including the entire set of email sent and received by an individual user, revealing much information about individual users as well as the behavior of groups of users in an organization.

Data Mining Approaches for Intrusion Detection in Email System Internet-Based

The main ideas are to use data mining techniques to discover consistent and useful patterns of email system that can recognize anomalies and known intrusions.

Combining email models for false positive reduction

A new method to compare multiple and combined classifiers, and show how it differs from past work is introduced, which analyzes the relative gain and maximum possible accuracy that can be achieved for certain combinations of classifiers to automatically choose the best combination.

Data Mining in Personal Email Management

  • G. Soni
  • Computer Science, Business
  • 2012
This survey reviews research on how a Machine Learning and Data Mining technique, such as classification, clustering can contribute to the solution to the problem by constructing intelligent techniques which automate email managing tasks.

Email Mining: A Review

Various techniques and approaches used by researchers for email mining and subsequent classification of email messages in above categories are presented.

Email mining: tasks, common techniques, and tools

This paper organizes a survey on five major email mining tasks, namely spam detection, email categorization, contact analysis, email network property analysis and email visualization, and systematically review the commonly used techniques.

E-mail Traffic Analysis Using Visualisation and Decision Trees

This work focuses on traffic analysis of e-mail communications, by investigating different Artificial Intelligence (A.I.) or machine learning techniques to determine whether they are capable of assisting an analyst in searching for suspicious e- email accounts and monitoring those accounts for “unusual” or “abnormal” communication behaviour.

Mining E-Mail Content for a Small Enterprise

This paper describes a web-based approach to parse and mine email logs from a POP3 server for content information that can be associated with diseases with improved visualizations.



MET: an experimental system for Malicious Email Tracking

MET is a database of statistics about the trajectory of email attachments in and out of a network system, and the culling together of these statistics across networks to present a global view of the spread of the malicious software.

Mining Audit Data to Build Intrusion Detection Models

A data mining framework for constructing intrusion detection models to mine system audit data for consistent and useful patterns of program and user behavior, and use the set of relevant system features presented in the patterns to compute classifiers that can recognize anomalies and known intrusions.

MEF: Malicious Email Filter - A UNIX Mail Filter That Detects Malicious Windows Executables

A freely distributed malicious binary filter incorporated into Procmail that can detect malicious Windows attachments by integrating with a UNIX mail server and allows for the efficient propagation of detection models from a central server.

Learning Patterns from Unix Process Execution Traces for Intrusion Detection

The preliminary experiments to extend the work pioneered by Forrest on learning the (normal abnormal) patterns of Unix processes can be used to identify misuses of and intrusions in Unix systems indicate that machine learning can play an important role by generalizing stored sequence information to perhaps provide broader intrusion detection services.

A Geometric Framework for Unsupervised Anomaly Detection

A new geometric framework for unsupervised anomaly detection is presented, which are algorithms that are designed to process unlabeled data to detect anomalies in sparse regions of the feature space.

Gauging Similarity with n-Grams: Language-Independent Categorization of Text

A language-independent means of gauging topical similarity in unrestricted text by combining information derived from n-grams with a simple vector-space technique that makes sorting, categorization, and retrieval feasible in a large multilingual collection of documents.

Estimating Continuous Distributions in Bayesian Classifiers

This paper abandon the normality assumption and instead use statistical methods for nonparametric density estimation for kernel estimation, which suggests that kernel estimation is a useful tool for learning Bayesian models.

Algorithm 457: finding all cliques of an undirected graph

Two backtracking algorithms are presented, using a branchand-bound technique [4] to cut off branches that cannot lead to a clique, and generates cliques in a rather unpredictable order in an attempt to minimize the number of branches to be traversed.

Introduction to Mathematical Statistics

1. Probability and Distributions. 2. Multivariate Distributions. 3. Some Special Distributions. 4. Some Elementary Statistical Inferences 5. Consistency and Limiting Distributions 6. Maximum