Corpus ID: 237532270

Discovering Useful Compact Sets of Sequential Rules in a Long Sequence

@article{Bourrand2021DiscoveringUC,
  title={Discovering Useful Compact Sets of Sequential Rules in a Long Sequence},
  author={Erwan Bourrand and Luis Gal'arraga and Esther Galbrun and {\'E}lisa Fromont and Alexandre Termier},
  journal={ArXiv},
  year={2021},
  volume={abs/2109.07519}
}
We are interested in understanding the underlying generation process for long sequences of symbolic events. To do so, we propose COSSU, an algorithm to mine small and meaningful sets of sequential rules. The rules are selected using an MDL-inspired criterion that favors compactness and relies on a novel rule-based encoding scheme for sequences. Our evaluation shows that COSSU can successfully retrieve relevant sets of closed sequential rules from a long sequence. Such rules constitute an… Expand

Figures and Tables from this paper

References

SHOWING 1-10 OF 18 REFERENCES
Using Partially-Ordered Sequential Rules to Generate More Accurate Sequence Prediction
TLDR
Experiments on large click-stream datasets for webpage recommendation show that using a new type of sequential rules named partially-ordered sequential rules can greatly increase prediction accuracy, while requiring a smaller training set. Expand
Efficiently Summarising Event Sequences with Rich Interleaving Patterns
TLDR
This paper proposes SQUISH, a novel greedy MDL-based method for summarising sequential data using rich patterns that are allowed to interleave, and shows how this results in better models, as well as discovers meaningful semantics in the form patterns that identify multiple choices of values. Expand
The long and the short of it: summarising event sequences with serial episodes
TLDR
This paper formalises how to encode sequential data using sets of serial episodes, and uses the encoded length as a quality score to identify the set of sequential patterns that summarises the data best. Expand
RuleGrowth: mining sequential rules common to several sequences by pattern-growth
TLDR
This paper presents RuleGrowth, a novel algorithm for mining sequential rules common to several sequences that uses a pattern-growth approach for discovering sequential rules such that it can be much more efficient and scalable. Expand
Keeping it Short and Simple: Summarising Complex Event Sequences with Multivariate Patterns
TLDR
Ditto, a highly efficient algorithm that approximates the ideal result very well, is introduced, and it scales favourably with the length of the data, the number of attributes, the alphabet sizes. Expand
ERMiner: Sequential Rule Mining Using Equivalence Classes
TLDR
An algorithm named ERMiner (Equivalence class based sequential Rule Miner) is proposed, which relies on the novel idea of searching using equivalence classes of rules having the same antecedent or consequent to prune the search space. Expand
Mining closed strict episodes
  • Nikolaj Tatti, B. Cule
  • Computer Science, Mathematics
  • 2010 IEEE International Conference on Data Mining
  • 2010
TLDR
This work introduces a technique for discovering closed episodes by introducing strict episodes, and argues that this class is general enough, and at the same time is able to define a natural subset relationship within it and use it efficiently. Expand
Mining Association Rules in Long Sequences
TLDR
This paper presents an efficient algorithm to mine confident association rules within patterns, and concludes that it indeed gives intuitive results in a number of applications. Expand
Discovery of Meaningful Rules in Time Series
TLDR
This work shows why the ideas of symbolic stream rule discovery are not directly suitable for rule discovery in time series, and presents novel algorithms that allow us to quickly discover high quality rules in very large datasets that accurately predict the occurrence of future events. Expand
Sets of Robust Rules, and How to Find Them
TLDR
This paper defines the problem of association rule mining in terms of the Minimum Description Length principle, and proposes Grab, a greedy heuristic to efficiently discover good sets of noise-resistant rules directly from data. Expand
...
1
2
...