Analyzing and Storing Network Intrusion Detection Data using Bayesian Coresets: A Preliminary Study in Offline and Streaming Settings

  title={Analyzing and Storing Network Intrusion Detection Data using Bayesian Coresets: A Preliminary Study in Offline and Streaming Settings},
  author={Fabio Massimo Zennaro},
In this paper we offer a preliminary study of the application of Bayesian coresets to network security data. Network intrusion detection is a field that could take advantage of Bayesian machine learning in modelling uncertainty and managing streaming data; however, the large size of the data sets often hinders the use of Bayesian learning methods based on MCMC. Limiting the amount of useful data is a central problem in a field like network traffic analysis, where large amount of redundant data… 
3 Citations

A hybrid machine learning model for intrusion detection in VANET

A new machine learning model is proposed to improve the performance of IDSs by using Random Forest and a posterior detection based on coresets to improved the detection accuracy and increase detection efficiency.

Towards faster big data analytics for anti‐jamming applications in vehicular ad‐hoc network

A new vehicular data prioritization model based on coresets to accelerate the Big Data Analytics in VANET is proposed and can significantly increase the efficiency for clustering in jamming detection while keeping and improving the clustering quality.



Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization

A reliable dataset is produced that contains benign and seven common attack network flows, which meets real world criteria and is publicly avaliable and evaluates the performance of a comprehensive set of network traffic features and machine learning algorithms to indicate the best set of features for detecting the certain attack categories.

Coresets for Scalable Bayesian Logistic Regression

This paper develops an efficient coreset construction algorithm for Bayesian logistic regression models that provides theoretical guarantees on the size and approximation quality of the coreset -- both for fixed, known datasets, and in expectation for a wide class of data generative models.

On the Security of Machine Learning in Malware C&C Detection

This work first systematize works in the field of C8C detection and then, using existing models from the literature, go on toSystematize attacks against the ML components used in these approaches, to analyze the evasion resilience of these detection techniques.

Network intrusion detection

This chapter discusses the TCP/IP Internet Model, the back-to- Basics: DNS Theory, and an Overview of Running Snort Rules, which aims to clarify the role of snort in the security model.

Automated Scalable Bayesian Inference via Hilbert Coresets

This work takes advantage of data redundancy to shrink the dataset itself as a preprocessing step, providing fully-automated, scalable Bayesian inference with theoretical guarantees, and develops Hilbert coresets, i.e., Bayesian coresets constructed under a norm induced by an inner-product on the log-likelihood function space.

WASP: Scalable Bayes via barycenters of subset posteriors

The Wasserstein posterior (WASP) has an atomic form, facilitating straightforward estimation of posterior summaries of functionals of interest and theoretical justification in terms of posterior consistency and algorithm eciency is provided.

Deep Learning

Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.

Asymptotically Exact, Embarrassingly Parallel MCMC

This paper presents a parallel Markov chain Monte Carlo (MCMC) algorithm in which subsets of data are processed independently, with very little communication, and proves that it generates asymptotically exact samples and empirically demonstrate its ability to parallelize burn-in and sampling in several models.

Bayesian Coreset Construction via Greedy Iterative Geodesic Ascent

GIGA is developed, a novel algorithm for Bayesian coreset construction that scales the coreset log-likelihood optimally and reduces posterior approximation error by orders of magnitude compared with previous coreset constructions.