The CRISP-DM methodology is described in terms of a hierarchical process model, consisting of sets of tasks described at four levels of abstraction (from general to specific): phase, generic task, specialized task, and process instance (see figure 1).
At the top level, the data mining process is organized into a number of phases; each phase consists of several second-level generic tasks.
This second level is called generic because it is intended to be general enough to cover all possible data mining situations. The generic tasks are intended to be as complete and stable as possible. Complete means covering both the whole process of data mining and all possible data mining applications. Stable means that the model should be valid for yet unforeseen developments like new modeling techniques.
The third level, the specialized task level, is the place to describe how actions in the generic tasks should be carried out in certain specific situations. For example, at the second level there might be a generic task called clean data. The third level describes how this task differs in different situations, such as cleaning numeric values versus cleaning categorical values, or whether the problem type is clustering or predictive modeling.
The description of phases and tasks as discrete steps performed in a specific order represents an idealized sequence of events. In practice, many of the tasks can be performed in a different order, and it will often be necessary to repeatedly backtrack to previous tasks and repeat certain actions.
Our process model does not attempt to capture all of these possible routes through the data mining process because this would require an overly complex process model.
Reference model and user guide:
Horizontally, the CRISP-DM methodology distinguishes between the reference model and the user guide. The reference model presents a quick overview of phases, tasks, and their outputs, and describes what to do in a data mining project. The user guide gives more detailed tips and hints for each phase and each task within a phase, and depicts how to carry out a data mining project.
This website, in alignment with the original CRISP-DM document, covers both the reference model and the user guide at the generic level.