• Publications
  • Influence
Anomalous Payload-Based Network Intrusion Detection
TLDR
A payload-based anomaly detector, called PAYL, for intrusion detection that demonstrates the surprising effectiveness of the method on the 1999 DARPA IDS dataset and a live dataset the authors collected on the Columbia CS department network.
The merge/purge problem for large databases
TLDR
This paper details the sorted neighborhood method that is used by some to solve merge/purge and presents experimental results that demonstrates this approach may work well in practice but at great expense, and shows a means of improving the accuracy of the results based upon a multi-pass approach.
Data mining methods for detection of new malicious executables
TLDR
This work presents a data mining framework that detects new, previously unseen malicious executables accurately and automatically and more than doubles the current detection rates for new malicious executable.
A data mining framework for building intrusion detection models
TLDR
A data mining framework for adaptively building Intrusion Detection (ID) models is described, to utilize auditing programs to extract an extensive set of features that describe each network connection or host session, and apply data mining programs to learn rules that accurately capture the behavior of intrusions and normal activities.
A framework for constructing features and models for intrusion detection systems
TLDR
A novel framework, MADAM ID, for Mining Audit Data for Automated Models for Instrusion Detection, which uses data mining algorithms to compute activity patterns from system audit data and extracts predictive features from the patterns.
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem
TLDR
This paper develops a system for accomplishing this Data Cleansing task and demonstrates its use for cleansing lists of names of potential customers in a direct marketing-type application and reports on the successful implementation for a real-world database that conclusively validates results previously achieved for statistically generated data.
AdaCost: Misclassification Cost-Sensitive Boosting
TLDR
It is formally show that AdaCost reduces the upper bound of cumulative misclassification cost of the training set, which is significant reduction in the cumulative mis classification cost over AdaBoost without consuming additional computing power.
Data Mining Approaches for Intrusion Detection
TLDR
An agent-based architecture for intrusion detection systems where the learning agents continuously compute and provide the updated (detection) models to the detection agents is proposed.
On the feasibility of online malware detection with performance counters
TLDR
This paper examines the feasibility of building a malware detector in hardware using existing performance counters and finds that data from performance counters can be used to identify malware and that the detection techniques are robust to minor variations in malware programs.
Anagram: A Content Anomaly Detector Resistant to Mimicry Attack
TLDR
Anagram is presented, a content anomaly detector that models a mixture of high-order n-grams (n > 1) designed to detect anomalous and “suspicious” network packet payloads and is demonstrated that Anagram can identify anomalous traffic with high accuracy and low false positive rates.
...
...