Discovering outstanding subgroup lists for numeric targets using MDL

@article{Proena2020DiscoveringOS,
  title={Discovering outstanding subgroup lists for numeric targets using MDL},
  author={Hugo Manuel Proença and Peter Gr{\"u}nwald and Thomas B{\"a}ck and Matthijs van Leeuwen},
  journal={ArXiv},
  year={2020},
  volume={abs/2006.09186}
}
The task of subgroup discovery (SD) is to find interpretable descriptions of subsets of a dataset that stand out with respect to a target attribute. To address the problem of mining large numbers of redundant subgroups, subgroup set discovery (SSD) has been proposed. State-of-the-art SSD methods have their limitations though, as they typically heavily rely on heuristics and/or user-chosen hyperparameters. We propose a dispersion-aware problem formulation for subgroup set discovery that is… 
Robust subgroup discovery Discovering subgroup lists using MDL
TLDR
RSD is proposed, a greedy heuristic that finds good subgroup lists and guarantees that the most significant subgroup found according to the MDL criterion is added in each iteration, which is shown to be equivalent to a Bayesian one-sample proportions, multinomial, or t-test between the subgroup and dataset marginal target distributions plus a multiple hypothesis testing penalty.
DISGROU: an algorithm for discontinuous subgroup discovery
In this paper, we focus on the problem of the search for subgroups in numerical data. This approach aims to identify the subsets of objects, called subgroups, which exhibit interesting
Robust subgroup discovery
TLDR
SSD++ is proposed, a greedy heuristic that finds good subgroup lists and guarantees that the most significant subgroup found according to the MDL criterion is added in each iteration, which is shown to be equivalent to a Bayesian one-sample proportions, multinomial, or t-test between the subgroup and dataset marginal target distributions plus a multiple hypothesis testing penalty.
The Minimum Description Length Principle for Pattern Mining: A Survey
TLDR
The MDL principle, a model selection method grounded in information theory, has been applied to pattern mining with the aim to obtain compact high-quality sets of patterns, and methods for mining various types of data and patterns are reviewed.

References

SHOWING 1-10 OF 25 REFERENCES
FSSD - A Fast and Efficient Algorithm for Subgroup Set Discovery
TLDR
This work proposes an efficient and parameter-free algorithm dubbed FSSD that uses several optimization strategies that enable to efficiently provide a high quality pattern set in a short amount of time and based on a greedy scheme.
Diverse subgroup set discovery
TLDR
This work considers three degrees of redundancy, and proposes corresponding heuristic selection strategies in order to eliminate redundancy in subgroup set discovery, and incorporates these (generic) subgroup selection methods in a beam search to improve the balance between exploration and exploitation.
On subgroup discovery in numerical domains
TLDR
A new subgroup discovery algorithm is presented that prunes large parts of the search space by exploiting bounds between related numerical subgroup descriptions and the same algorithm can also be applied to ordinal attributes.
For real: a thorough look at numeric attributes in subgroup discovery
TLDR
This paper presents a generic framework that can be instantiated in various ways in order to create different strategies for dealing with numeric data, and describes an experimental comparison of a considerable range of numeric strategies in SD where these strategies are organised according to four central dimensions.
Subgroup Discovery with CN2-SD
TLDR
A subgroup discovery algorithm, CN2-SD, developed by modifying parts of the CN2 classification rule learner: its covering algorithm, search heuristic, probabilistic classification of instances, and evaluation measures, shows substantial reduction of the number of induced rules, increased rule coverage and rule significance, as well as slight improvements in terms of the area under ROC curve.
Subjectively Interesting Subgroup Discovery on Real-Valued Targets
TLDR
This work introduces a method to find subgroups in the data that are maximally informative (in the Information Theoretic sense) with respect to one or more real-valued target attributes.
Identifying consistent statements about numerical data with dispersion-corrected subgroup discovery
TLDR
The optimistic estimator framework for optimal subgroup discovery is extended to a new class of objective functions: it is shown how tight estimators can be computed efficiently for all functions that are determined by subgroup size, the subgroup median value, and a dispersion measure around the median.
Anytime discovery of a diverse set of patterns with Monte Carlo tree search
TLDR
This work formally defines pattern mining as a game and to solve it with Monte Carlo tree search (MCTS), an exhaustive search guided by random simulations which can be stopped early (limited budget) by virtue of its best-first search property.
Association Discovery in Two-View Data
TLDR
The empirical evaluation on real-world data demonstrates that only modest numbers of associations are needed to characterize the two-view structure present in the data, while the obtained translation rules are easily interpretable and provide insight into the data.
...
1
2
3
...