This site is sponsored and hosted by Smart Vision Europe Ltd.


Mapping Generic Models to Specialised Tasks

Mapping generic models to specialized models

Data mining context

The data mining context drives mapping between the generic and the specialized level in CRISP-DM. Currently, we distinguish between four different dimensions of data mining contexts:

  • The application domain is the specific area in which the data mining project takes place
  • The data mining problem type describes the specific class(es) of objective(s) that the data mining project deals with (Click here see Appendix 2 which provides more detail)
  • The technical aspect covers specific issues in data mining that describe different (technical) challenges that usually occur during data mining
  • The tool and technique dimension specifies which data mining tool(s) and/or techniques are applied during the data mining project
  • Table 1 below summarizes these dimensions of data mining contexts and shows specific examples for each dimension

Dimensions of Data Mining

 

A specific data mining context is a concrete value for one or more of these dimensions. For example, a data mining project dealing with a classification problem in churn prediction constitutes one specific context. The more values for different context dimensions are fixed, the more concrete is the data mining context.

Mappings with contexts:

This methodology distinguishes between two different types of mapping between generic and specialized levels in CRISP-DM.

Mapping for the present: If we only apply the generic process model to perform a single data mining project, and attempt to map generic tasks and their descriptions to the specific project as required, we talk about a single mapping for (probably) only one usage.

Mapping for the future: If we systematically specialize the generic process model according to a pre-defined context (or similarly systematically analyze and consolidate experiences of a single project toward a specialized process model for future usage in comparable contexts), we talk about explicitly writing up a specialized process model in terms of CRISP-DM.

Which type of mapping is appropriate for your own purposes depends on your specific data mining context and the needs of your organization.

How to map
The basic strategy for mapping the generic process model to the specialized level is the same for both types of mappings:

 

  • Analyze your specific context
  • Remove any details not applicable to your context
  • Add any details specific to your context
  • Specialize (or instantiate) generic contents according to concrete characteristics of your context
  • Possibly rename generic contents to provide more explicit meanings in your context for the sake of clarity

Description of parts

Contents
The CRISP-DM process model (in this instance this website) is organized into four different parts:

  • This introduction to the CRISP-DM methodology, which provides some general guidelines for mapping the generic process model to specialized process models.
  • Describes the CRISP-DM reference model, its phases, generic tasks, and outputs
  • Presentation of the CRISP-DM user guide, which goes beyond the pure description of phases, generic tasks, and outputs, and contains more detailed advice on how to perform data mining projects
  • Appendices, which includes a glossary of important terminology and a characterization of data mining problem types

 

Purpose

Users and readers of this document should be aware of the following instructions:

If you are reading the CRISP-DM process model for the first time, begin with the introduction, in order to understand the CRISP-DM methodology, all of its concepts, and how different concepts relate to each other. In further readings, you might skip the introduction and only return to it if necessary for clarification.

  • If you need fast access to an overview of the CRISP-DM process model, refer to the CRISP-DM reference model, either to begin a data mining project quickly or to get an introduction to the CRISP-DM user guide
  • If you need detailed advice in performing your data mining project, the CRISP-DM user guide, is the most valuable part of this information. Note: if you have not read the introduction or the reference model first, go back and read these first two parts.
  • Finally, the appendix is useful as additional background information on CRISP-DM and data mining. Use the appendix to look up various terms if you are not yet an expert in the field.