Active Trace Clustering for Improved Process Discovery
Process mining refers to the extraction of process models from event logs. Real-life processes tend to be less structured and more flexible. Traditional process mining algorithms have problems dealing with such unstructured processes and generate “spaghetti-like” process models that are hard to comprehend. An approach to overcome this is to cluster process instances such that each of the resulting clusters correspond to coherent sets of process instances that can each be adequately represented by a process model. In this paper, we present multiple feature sets based on conserved patterns and show that the proposed feature sets have a better performance than contemporary approaches. We evaluate the goodness of the formed clusters using established fitness and comprehensibility metrics defined in the context of process mining. The proposed approach is able to generate clusters such that the process models mined from the clustered traces show a high degree of fitness and comprehensibility. Further, the proposed feature sets can be easily discovered in linear time making it amenable to real-time analysis of large data sets.