• Publications
  • Influence
Krimp: mining itemsets that compress
TLDR
The Krimp algorithm is introduced, which shows a dramatic reduction, up to seven orders of magnitude, in the number of frequent item sets, and the heuristic choices made in the design of the algorithm are evaluated.
Fast and reliable anomaly detection in categorical data
TLDR
This work introduces COMPREX, a new approach for identifying anomalies using pattern-based compression, which finds a collection of dictionaries that describe the norm of a database succinctly, and subsequently flags those points dissimilar to the norm as anomalies.
The Odd One Out: Identifying and Characterising Anomalies
TLDR
This paper gives a technique through which, given only a few negative examples, the decision landscape and optimal boundary can be predicted—making the approach parameter-free.
The long and the short of it: summarising event sequences with serial episodes
TLDR
This paper formalises how to encode sequential data using sets of serial episodes, and uses the encoded length as a quality score to identify the set of sequential patterns that summarises the data best.
Spotting Culprits in Epidemics: How Many and Which Ones?
TLDR
The Minimum Description Length principle is proposed to employ to identify the best set of seed nodes and virus propagation ripple, as the one by which to most succinctly describe the infected graph, and an efficient method called NETSLEUTH is given for the Susceptible-Infected virus propagation model.
VOG: Summarizing and Understanding Large Graphs
TLDR
The main ideas are to construct a "vocabulary" of sub graph-types that often occur in real graphs, and from a set of subgraphs, find the most succinct description of a graph in terms of this vocabulary.
Spiking neural networks, an introduction
TLDR
Two models of spiking neurons that employ pulse coding are presented, which are more powerful than their non-spiking predecessors as they can encode temporal information in their signals, but therefore do also need different and biologically more plausible rules for synaptic plasticity.
Item Sets that Compress
TLDR
Four heuristic algorithms are introduced for frequent item set mining using the MDL principle: the best set of frequent item sets is that set that compresses the database best.
Mining Connection Pathways for Marked Nodes in Large Graphs
TLDR
It is proved that solving the graph partitioning problem is NP-hard, and DOT2DOT is introduced, an efficient algorithm for partitioning marked nodes by finding simple pathways between nodes.
CMI: An Information-Theoretic Contrast Measure for Enhancing Subspace Cluster and Outlier Detection
TLDR
A novel contrast score is proposed that quantifies mutual correlations in subspaces by considering their cumulative distributions— without having to discretize the data.
...
1
2
3
4
5
...