SQL/MX Data Mining Guide

Preparing the Data
HP NonStop SQL/MX Data Mining Guide523737-001
2-6
Defining Events
Randomly sample source data
Improve computing efficiency for a profile using a selected sampling percentage
Reduce both the I/O costs and the CPU costs associated with computing a profile
See the SAMPLE Clause of SELECT in the SQL/MX Reference Manual.
Defining Events
Events are used to align related data in a single set of columns for mining. Example
events are life changes, such as getting married or switching jobs, or customer actions,
such as opening an account or requesting a credit limit increase.
The critical event to be defined for the business opportunity described in this manual is
the month the customer left—either by closing their account or by maintaining a zero
balance for three months. The problem is to align the data so that this event can be
derived as an attribute of the mining view.
Aligning the Data
Most mining algorithms and tools require that the input data be arranged so that all the
information pertaining to a given entity is contained in a single record. However, in
typical raw mining data, observations about a given entity can be stored in separate
rows and tables.
For example, the Account History table contains one record per customer per month,
summarizing the account status for that customer. The related Customers table
contains static information in the form of one row per customer. For this example, the
account status information must be reduced to a single row of information for each
customer. This data is paired with the static customer information to form the mining
view.
Two methods exist for mapping time-dependent data in the mining view. One method is
to take a value from a particular month and include that value in the mining view. For
example, the checking account balance for January 1998 can be included in the mining
view for each customer because the balance is a single value.
Alternatively, a value can be aggregated over a time period to compute a single value
for the mining view. An example is the average checking account balance for January
1998 through June 1998.
Absolute and relative methods exist for aligning time-dependent data in the mining
view. Specifying an event relative to a customer is often more meaningful than to
specify an absolute event, such as a given year and month.
The account balance one month prior to closing an account or the average account
balance for six months prior to closing an account are both examples of relative
events. In this type of relative time specification, the actual months selected depend on
an event that is different for each customer. Aligning the data by using relative events