• Corpus ID: 7282994

Handling Big(ger) Logs: Connecting ProM 6 to Apache Hadoop

@inproceedings{Hernndez2015HandlingBL,
  title={Handling Big(ger) Logs: Connecting ProM 6 to Apache Hadoop},
  author={Sergio Hern{\'a}ndez and Sebastiaan J. van Zelst and Joaqu{\'i}n Ezpeleta and Wil M.P. van der Aalst},
  booktitle={BPM},
  year={2015}
}
Within process mining the main goal is to support the analysis, improvement and apprehension of business processes. Numerous process mining techniques have been developed with that purpose. The majority of these techniques use conventional computation models and do not apply novel scalable and distributed techniques. In this paper we present an integrative framework connecting the process mining framework ProM with the distributed computing environment Apache Hadoop. The integration allows for… 

Figures from this paper

Scalable and distributed architecture based on Apache Spark Streaming and PROM6 for processing RoRo terminals logs
  • M. Mhand, A. Boulmakoul, Hassan Badir
  • Computer Science
    Proceedings of the New Challenges in Data Sciences: Acts of the Second Conference of the Moroccan Classification Society
  • 2019
TLDR
This work designs a scalable and distributed architecture for real time monitoring of operational business processes of a RoRo port terminal that permits the exploitation of process mining techniques to process large amount of events logs of several hundreds of gigabytes for process mining analysis.
Assessing Process Discovery Scalability in Data Intensive Environments
TLDR
This paper assesses the scalability of applying process discovery techniques in data intensive environments, and proposes ways to compute the internal data abstractions used by the discovery techniques within the MapReduce framework.
Increasing Scalability of Process Mining using Event Dataframes: How Data Structure Matters
TLDR
This paper proposes the usage of mainstream columnar storages and dataframes to increase the scalability of process mining and presents some algorithms on such structures and their complexity will be calculated.
Distributed Compliance Monitoring of Business Processes over MapReduce Architectures
TLDR
A previously implemented framework for compliance verification is adopted and it is shown how it can be efficiently distributed on a set of computing nodes to support scalable run-time monitoring when dealing with large volumes of event logs.
Accelerating Process Mining using Relational Databases
TLDR
This research aims to address scalability problem in terms of memory use and time consumption by using relational databases as the framework to both store event data and do process mining analysis.
Applying MapReduce to conformance checking
TLDR
It is shown that conformance checking can be distributed using MapReduce and can benefit from it, and it is demonstrated that computation time scales linearly with the growth of event log size.
DB-XES: Enabling Process Discovery in the Large
TLDR
This paper proposes a new technique based on relational database technology as a solution for scalable process discovery, and introduces DB-XES as a database schema which resembles the standard XES structure, and shows how this greatly improves on the memory requirements of the state-of-the-art process discovery techniques.
Business Process Analytics and Big Data Systems: A Roadmap to Bridge the Gap
TLDR
It is advocated that a good understanding of the business process and Big Data worlds can play an effective role in improving the efficiency and the quality of various data-intensive business operations using a wide spectrum of emerging Big Data systems.
Predictive Monitoring of Business Processes: A Survey
TLDR
The different types of computational predictive methods, such as statistical techniques or machine learning approaches, and certain aspects as the type of predicted values and quality evaluation metrics, have been considered for the categorization of these methods.
...
...

References

SHOWING 1-10 OF 12 REFERENCES
Scalable Process Discovery Using Map-Reduce
  • Joerg Evermann
  • Computer Science
    IEEE Transactions on Services Computing
  • 2016
TLDR
This paper presents Map- Reduce implementations of two well-known process mining algorithms to take advantage of the scalability of the Map-Reduce approach and presents the design of a series of mappers and reducers to compute the log-based ordering relations from distributed event logs.
Data Streams in ProM 6: A Single-node Architecture
TLDR
An overview of the newly created extension that lays a foundation for integrating streaming environments with ProM, and a case study is presented in which a real-life online data stream has been incorporated in a basic ProM-based analysis.
Decomposing Petri nets for process mining: A generic approach
TLDR
The decomposition approach is generic and can be combined with different existing process discovery and conformance checking techniques to split computationally challenging process mining problems into many smaller problems that can be analyzed easily and whose results can be Combined into solutions for the original problems.
Flexible Heuristics Miner (FHM)
  • A. Weijters, J. Ribeiro
  • Computer Science
    2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)
  • 2011
TLDR
A new process representation language is presented in combination with an accompanying process mining algorithm that results in easy to understand process models even in the case of non-trivial constructs, low structured domains and the presence of noise.
The ProM Framework: A New Era in Process Mining Tool Support
TLDR
The ProM framework is introduced and an overview of the plug-ins that have been developed and is flexible with respect to the input and output format, and is also open enough to allow for the easy reuse of code during the implementation of new process mining ideas.
Workflow mining: discovering process models from event logs
TLDR
A new algorithm is presented to extract a process model from a so-called "workflow log" containing information about the workflow process as it is actually being executed and represent it in terms of a Petri net.
Online Process Discovery to Detect Concept Drifts in LTL-Based Declarative Process Models
TLDR
This paper presents a novel framework for the discovery of LTL-based declarative process models from streaming event data in settings where it is impossible to store all events over an extended period or where processes evolve while being analyzed.
ProM 4.0: Comprehensive Support for Real Process Analysis
TLDR
The functionality of ProM 4.0 is described, which makes ProM a versatile tool for process analysis which is not restricted to model analysis but also includes log-based analysis.
Discovering Block-Structured Process Models from Event Logs - A Constructive Approach
TLDR
This work provides an extensible framework to discover from any given log a set of block-structured process models that are sound and fit the observed behaviour, and gives sufficient conditions on the log for which the algorithm returns a model that is language-equivalent to the process model underlying the log, including unseen behaviour.
Process Mining - Discovery, Conformance and Enhancement of Business Processes
TLDR
This book provides real-world techniques for monitoring and analyzing processes in real time and is a powerful new tool destined to play a key role in business process management.
...
...