Krimp: mining itemsets that compress
The Krimp algorithm is introduced, which shows a dramatic reduction, up to seven orders of magnitude, in the number of frequent item sets, and the heuristic choices made in the design of the algorithm are evaluated.
Fast and reliable anomaly detection in categorical data
- L. Akoglu, Hanghang Tong, Jilles Vreeken, C. Faloutsos
- Computer ScienceInternational Conference on Information and…
- 29 October 2012
This work introduces COMPREX, a new approach for identifying anomalies using pattern-based compression, which finds a collection of dictionaries that describe the norm of a database succinctly, and subsequently flags those points dissimilar to the norm as anomalies.
Spotting Culprits in Epidemics: How Many and Which Ones?
- B. Prakash, Jilles Vreeken, C. Faloutsos
- Computer ScienceIEEE 12th International Conference on Data Mining
- 10 December 2012
The Minimum Description Length principle is proposed to employ to identify the best set of seed nodes and virus propagation ripple, as the one by which to most succinctly describe the infected graph, and an efficient method called NETSLEUTH is given for the Susceptible-Infected virus propagation model.
VOG: Summarizing and Understanding Large Graphs
The main ideas are to construct a "vocabulary" of sub graph-types that often occur in real graphs, and from a set of subgraphs, find the most succinct description of a graph in terms of this vocabulary.
The Odd One Out: Identifying and Characterising Anomalies
This paper gives a technique through which, given only a few negative examples, the decision landscape and optimal boundary can be predicted—making the approach parameter-free.
Item Sets that Compress
Four heuristic algorithms are introduced for frequent item set mining using the MDL principle: the best set of frequent item sets is that set that compresses the database best.
Is exploratory search different? A comparison of information search behavior for exploratory and lookup tasks
- Kumaripaba Athukorala, D. Glowacka, G. Jacucci, Antti Oulasvirta, Jilles Vreeken
- Computer Science, BusinessJ. Assoc. Inf. Sci. Technol.
- 1 November 2016
The goal of this article is to investigate how to separate the 2 types of tasks in an IR system using easily measurable behaviors, and shows that IR systems can distinguish the 2 search categories in the course of a search session.
Spiking neural networks, an introduction
- Jilles Vreeken
- Computer Science, Biology
Two models of spiking neurons that employ pulse coding are presented, which are more powerful than their non-spiking predecessors as they can encode temporal information in their signals, but therefore do also need different and biologically more plausible rules for synaptic plasticity.
The long and the short of it: summarising event sequences with serial episodes
This paper formalises how to encode sequential data using sets of serial episodes, and uses the encoded length as a quality score to identify the set of sequential patterns that summarises the data best.
CMI: An Information-Theoretic Contrast Measure for Enhancing Subspace Cluster and Outlier Detection
A novel contrast score is proposed that quantifies mutual correlations in subspaces by considering their cumulative distributions— without having to discretize the data.