SQL/MX Data Mining Guide

HP NonStop SQL/MX Data Mining Guide523737-001
2-1
2 Preparing the Data
Section 1, Introduction identifies and defines a business opportunity, the first step in
the knowledge discovery process supported by SQL/MX. This section describes Steps
2 through 5.
1. Identify and define a business opportunity.
2. Preprocess and load the data for the business opportunity.
The first preparation step is to address these problems by preprocessing the data
in various ways—for example, verifying and mapping the data. Then load the data
into your database system.
See Loading the Data on page 2-2.
3. Profile and understand the relevant data.
Generate a variety of statistics, such as column unique entry counts, value ranges,
number of missing values, mean, variance, and so on.
See Profiling the Data on page 2-2.
4. Define events relevant to the business opportunity being explored.
Events are used to align related data in a single set of columns for mining.
Example events are life changes, such as getting married or switching jobs, or
customer actions, such as opening an account or requesting a credit limit increase.
See Defining Events on page 2-6.
5. Derive attributes.
For example, customer age can be derived from birth date. Account summary
statistics, such as maximum and minimum balances, can be derived from monthly
status information.
See Deriving Attributes
on page 2-9.
6. Create the data mining view.
7. Mine the data and build models.
8. Deploy models.
9. Monitor model performance.