SQL/MX Data Mining Guide
Introduction
HP NonStop SQL/MX Data Mining Guide—523737-001
1-10
Creating the Mining View
Creating the Mining View
The final data preparation step is to transform the data set into a mining view, a form in
which all attributes about the main mining entity appear in a single record. The mining
entity used in this manual is a credit card account. The data mining challenge is to
determine predictors for when a customer will close a credit card account.
Transforming the data set to a single record for each mining entity often involves a
pivot operation, in which attributes in multiple rows are collapsed and put into a single
row. For example, in the credit card example, the set of history records associated with
each account is collapsed to a single record and then appended to the corresponding
customer record.
For further information, see Section 3, Creating the Data Mining View.
The resulting table looks similar to this:
Mining View
This table contains demographic information from the Customers table, such as marital
status and income, and also pivoted columns from the Account History table, such as
balances prior to leaving. You use example data set in the data mining step, the next
step in the knowledge discovery process.
Mining the Data
In the data mining step, core knowledge discovery techniques are applied to gain
insight, learn patterns, or verify hypotheses. The main tasks performed in this step are
either predictive or descriptive in nature. Predictive tasks involve trying to determine
what will happen in the future, based upon historical data. Descriptive tasks involve
finding patterns describing the data.
The task used in this customer scenario is predictive: to build a model to predict
attrition of credit card customers based on historical information, such as
demographics and account activity.
The most common predictive tasks are:
•
Classification—Classify a case (or record) into one of several predefined classes.
•
Regression—Map a case (or record) into a numerical prediction value.
Account Mar Status Income Bal-3 Bal-2 Bal-1 Date Left Left
1234567 Single 65,000 3000.00 2870.00 1200.00 12/99 Yes
2500000 Divorced 32,000 0.00 0.00 0.00 11/99 Yes
4098124 Divorced 44,000 4817.94 4596.10 4347.63 10/98 Yes
5200000 Married 32,000 – – – – No