Behavior Profiling of Email

  title={Behavior Profiling of Email},
  author={S. Stolfo and S. Hershkop and Ke Wang and Olivier Nimeskern and Chia-Wei Hu},
This paper describes the forensic and intelligence analysis capabilities of the Email Mining Toolkit (EMT) under development at the Columbia Intrusion Detection (IDS) Lab. EMT provides the means of loading, parsing and analyzing email logs, including content, in a wide range of formats. Many tools and techniques have been available from the fields of Information Retrieval (IR) and Natural Language Processing (NLP) for analyzing documents of various sorts, including emails. EMT, however, extends… Expand
Behavior-based modeling and its application to Email analysis
It is shown by way of simulation that virus propagations are detectable since viruses may emit emails at rates different than human behavior suggests is normal, and email is directed to groups of recipients in ways that violate the users' typical communications with their social groups. Expand
Email mining toolkit supporting law enforcement forensic analyses
This project focuses on providing support to detectives and analysts in law enforcement to develop powerful means of analyzing emails acquired under due process as evidence in various investigations. Expand
Behavior-based email analysis with application to spam detection
The Email Mining Toolkit is a data mining toolkit designed to analyze offline email corpora, including the entire set of email sent and received by an individual user, revealing much information about individual users as well as the behavior of groups of users in an organization. Expand
E-mail Behavior Profiling based on Attachment Type and Language
—Protection of confidential information from insider threat is crucial for any organization. In particular, compromise of information via email is relatively easy and can go undetected. We haveExpand
Data Mining Approaches for Intrusion Detection in Email System Internet-Based
As the Internet grows at a phenomenal rate email systems has become a widely used electronic form of communication. Everyday, a large number of people exchange messages in this fast and inexpensiveExpand
Combining email models for false positive reduction
A new method to compare multiple and combined classifiers, and show how it differs from past work is introduced, which analyzes the relative gain and maximum possible accuracy that can be achieved for certain combinations of classifiers to automatically choose the best combination. Expand
Data Mining in Personal Email Management
E-mail is still a popular mode of Internet communication and contains a large percentage of every-day information. Hence, email overload has grown over the past years becoming a problem for personalExpand
Email Mining: A Review
E-mail is one of the most widely used ways of written communication over the internet, and its traffic has increased exponentially with the advent of World Wide Web. The increase in email trafficExpand
Email mining: tasks, common techniques, and tools
This paper organizes a survey on five major email mining tasks, namely spam detection, email categorization, contact analysis, email network property analysis and email visualization, and systematically review the commonly used techniques. Expand
E-mail Traffic Analysis Using Visualisation and Decision Trees
This work focuses on traffic analysis of e-mail communications, by investigating different Artificial Intelligence (A.I.) or machine learning techniques to determine whether they are capable of assisting an analyst in searching for suspicious e- email accounts and monitoring those accounts for “unusual” or “abnormal” communication behaviour. Expand


MET: an experimental system for Malicious Email Tracking
MET is a database of statistics about the trajectory of email attachments in and out of a network system, and the culling together of these statistics across networks to present a global view of the spread of the malicious software. Expand
Mining Audit Data to Build Intrusion Detection Models
A data mining framework for constructing intrusion detection models to mine system audit data for consistent and useful patterns of program and user behavior, and use the set of relevant system features presented in the patterns to compute classifiers that can recognize anomalies and known intrusions. Expand
MEF: Malicious Email Filter - A UNIX Mail Filter That Detects Malicious Windows Executables
A freely distributed malicious binary filter incorporated into Procmail that can detect malicious Windows attachments by integrating with a UNIX mail server and allows for the efficient propagation of detection models from a central server. Expand
Learning Patterns from Unix Process Execution Traces for Intrusion Detection
The preliminary experiments to extend the work pioneered by Forrest on learning the (normal abnormal) patterns of Unix processes can be used to identify misuses of and intrusions in Unix systems indicate that machine learning can play an important role by generalizing stored sequence information to perhaps provide broader intrusion detection services. Expand
A Geometric Framework for Unsupervised Anomaly Detection
A new geometric framework for unsupervised anomaly detection is presented, which are algorithms that are designed to process unlabeled data to detect anomalies in sparse regions of the feature space. Expand
Gauging Similarity with n-Grams: Language-Independent Categorization of Text
A language-independent means of gauging topical similarity in unrestricted text by combining information derived from n-grams with a simple vector-space technique that makes sorting, categorization, and retrieval feasible in a large multilingual collection of documents. Expand
Estimating Continuous Distributions in Bayesian Classifiers
This paper abandon the normality assumption and instead use statistical methods for nonparametric density estimation for kernel estimation, which suggests that kernel estimation is a useful tool for learning Bayesian models. Expand
Algorithm 457: finding all cliques of an undirected graph
Description bttroductian. A maximal complete subgraph (clique) is a complete subgraph that is not contained in any other complete subgraph. A recent paper [1] describes a number of techniques to findExpand