Definition and Instantiation of an Integrated Data Mining Process Tin2004-05873

Abstract

In practice, CRISP-DM is the most commonly used data mining process in both industry and academia. CRISP-DM has one major weakness: it is at once a process model, methodology and lifecycle. Therefore it lacks definition and detail. Also, the data mining process is completely decoupled from the software engineering process, even though its results have a definite impact on this process. This methodological deficiency is one of the main reasons why many data mining projects are not completed or why, if they are, they fail to meet customer expectations and are not used. This project aims to mitigate the above problems. To do this, it sets out to: (i) define and integrate a data mining process with a software process, examining what tasks they have in common, their respective inputs and outputs and CRISP-DM’s weaknesses, (ii) develop a process instance tailored to CRISP-DM and unify this instance with the Unified Process (RUP) and (iii) validate and transfer the technology to real cases. 1 Project objectives The specific project objectives are as follows: 1. Define and Integrate a Data Mining Process with a Software Process. Based on an established software process, like IEEE Std. 1074 or ISO 12207, and CRISP-DM, the aim is to create an integrated process by carefully examining the tasks they have in common, their respective inputs and outputs and CRISP-DM’s weaknesses. This study will include ideas or tasks from other processes enacted in related fields like customer relationship management (CRM) or knowledge engineering. 2. Develop a Process Instance. This objective aims to tailor the above integrated process to a particular software development paradigm, i.e. object orientation based on the Unified Process (RUP). The key goal is to incorporate and extend RUP techniques across the entire integrated process, stressing the software-DM relationship, the connection between software development tasks and pure DM tasks via inputs/outputs, * Email: fsegovia@fi.upm.es

Cite this paper

@inproceedings{Prez2007DefinitionAI, title={Definition and Instantiation of an Integrated Data Mining Process Tin2004-05873}, author={Javier Segovia P{\'e}rez}, year={2007} }