A Generative Model for Self/Non-self Discrimination in Strings

  title={A Generative Model for Self/Non-self Discrimination in Strings},
  author={Matti P{\"o}ll{\"a}},
A statistical model is presented as an alternative to negative selection in anomaly detection of discrete data. We extend the use of probabilistic generative models from fixed-length binary strings into variable-length strings from a finite symbol alphabet using a mixture model of multinomial distributions for the frequency of adjacent symbols in a sliding window over a string. Robust and localized change analysis of text corpora is viewed as an application area. 

On using an ensemble approach of AIS and SVM for text classification

An hybrid system for text classification based on the ensemble of both AIS and SVM approaches is presented, resulting in a classifica tion that improves upon all baseline contributors of the ensembl e committee.

A Hybrid AIS-SVM Ensemble Approach for Text Classification

An ensemble-based structure that includes Support Vector Machines and Artificial Immune Systems is put forward, using a heterogeneous ensemble to improve overall performance, including a confidence on each system classification as the differentiating factor.

Negative Selection of Written Language Using Character Multiset Statistics

Theoretical analysis on ergodic Markov chains is used to outline the properties of the presented anomaly detection algorithm and the probability of successful detection andSimulations are used to evaluate the detection sensitivity and the resolution of the analysis on both generated artificial data and real-world language data including the English Wikipedia.



Discriminating self from non-self with finite mixtures of multivariate Bernoulli distributions

  • T. Stibor
  • Mathematics, Computer Science
    GECCO '08
  • 2008
This work proposes to model self as a discrete probability distribution specified by finite mixtures of multivariate Bernoulli distributions and obtains information of non-self and hence is able to discriminate with probabilities self from non- self.

An Empirical Study of Self/Non-self Discrimination in Binary Data with a Kernel Estimator

This work proposes to measure distances in binary data by means of probabilities which are modeled with a kernel estimator and shows that such a probabilistic model is preeminently applicable for the self/non-self discrimination problem.

Self-nonself discrimination in a computer

A method for change detection which is based on the generation of T cells in the immune system is described, which reveals computational costs of the system and preliminary experiments illustrate how the method might be applied to the problem of computer viruses.

Latent Dirichlet Allocation

Application of Multinomial Mixture Model to Text Classification

It is shown, that the accuracy of the Bayes document classifier can be improved by the proposed model in comparison with theBayes classifiers based on the multivariate Bernoulli model, the multinomial model as well as the mult variables mixture model.

An Investigation of R-Chunk Detector Generation on Higher Alphabets

An algorithm for generating all possible generatable r-chunk detectors, which do not cover any elements in self set S are proposed, and it is shown that higher alphabets influence the number of generatable detectors in a negative manner.

N-gram-based text categorization

An N-gram-based approach to text categorization that is tolerant of textual errors is described, which worked very well for language classification and worked reasonably well for classifying articles from a number of different computer-oriented newsgroups according to subject.

Is negative selection appropriate for anomaly detection?

Investigations reveal that when applied to anomaly detection, the real-valued negative selection and self detector classification techniques require positive and negative examples to achieve a high classification accuracy, whereas, one-class SVMs only require examples from a single class.

Anomaly Detection Using Real-Valued Negative Selection

A real-valued representation for the negative selection algorithm and its applications to anomaly detection that uses only normal samples to generate abnormal samples, which are used as input to a classification algorithm.

An immunological approach to change detection: theoretical results

  • P. D’haeseleer
  • Computer Science
    Proceedings 9th IEEE Computer Security Foundations Workshop
  • 1996
The principle of holes (undetectable nonself strings) is illustrated, along with a proof of their existence for a large class of matching rules, and a lower bound on the size of the detector set is derived, based on information-theoretic grounds.