SQL/MX Data Mining Guide
Introduction
HP NonStop SQL/MX Data Mining Guide—523737-001
1-9
Preparing the Data
three months before a customer leaves, because these attributes are predictors of
attrition.
For the customers that do leave, the months leading up to leaving occur at various
points in time. For customers that do not leave, these months are chosen to be any
three consecutive months in which the account is open.
The information about these months should be aligned for all accounts in a single set
of columns, one for each of the three months. Most mining algorithms require a single
logical attribute, such as the balance one month before leaving, to be stored in one
column in all records, rather than in different columns in different records.
For example, consider this data in a table that contains monthly account balances for
each month in the three-year history period:
The balances prior to the event (of the customer leaving) are in different date columns
for these accounts, and therefore algorithms that build predictive models are not able
to consider this information.
A table organization that allows this information to be considered:
In this table, columns Bal-1 through Bal-3 contain account balances one through three
months prior to a customer leaving. Consequently, this information is aligned within a
single set of columns and can be considered during model creation.
For further information, see Defining Events on page 2-6.
Deriving Attributes
The next task is to derive attributes that are not relative to events. For example,
customer age can be derived from birth date. Part of the challenge of effective data
mining is identifying a set of derived attributes that capture key indicators relevant to
the business opportunity being explored.
For further information, see Deriving Attributes on page 2-9.
Account ... Bal 08/03 Bal 09/03 Bal 10/03 Bal 11/03 ... Left
1234567 7800.00 3000.00 2870.00 1200.00 Yes (closed)
2500000 0.00 0.00 0.00 0.00 Yes (0 bal)
Account ... Bal 07/02 Bal 08/02 Bal 09/02 Bal 10/02 ... Left
4098124 4817.94 4596.10 4347.63 4069.34 Yes (closed)
Account ... Bal-3 Bal-2 Bal-1 Date Left ... Left
1234567 3000.00 2870.00 1200.00 12/03 Yes (closed)
2500000 0.00 0.00 0.00 11/03 Yes (0 bal)
4098124 4817.94 4596.10 4347.63 10/03 Yes (closed)