SQL/MX Data Mining Guide
Introduction
HP NonStop SQL/MX Data Mining Guide—523737-001
1-6
Defining the Business Opportunity
refined later in the knowledge discovery process as more information becomes
available.
This manual uses this opportunity scenario to describe the knowledge discovery
process and how to implement it. The data set used to illustrate techniques and
SQL/MX features consists of two tables: one containing customer information and the
other containing account history information. This data set is presented in Appendix A
through C of this manual.
A subset of this data set is shown in these tables:
Customers Table
Account History Table
The first table, the Customers table, contains one row for each credit card account and
consists of customer demographic information such as marital status, income, and so
on. For a large financial institution, a customers table such as this one might contain
approximately 10 million rows and 100 columns.
The second table, the Account History table, contains monthly status records, one for
each account for each month the account was open over a given time period, and
consists of about 200 columns. For this example, suppose the time period is three
years. The history table would then contain about 360 million rows, assuming 10
million customers.
Given these parameters, the size of the first table is about 5 GB (10 million rows, 500
bytes in each row), and the size of the second table is about 360 GB (360 million rows,
1000 bytes in each row).
For the example business opportunity, the Status and Balance fields of the Account
History table are used to determine if a customer will close their account. If the Status
changes from Open to Closed or if the Balance is zero for three consecutive months,
Account Name Marital Status Home Income ...
1234567 Jones, Mary Single Own 65,000
2500000 Abbas, Ali Divorced Rent 32,000
4098124 Kano, Tomoko Divorced Own 44,000
2400000 Lund, Erika Widow Own 28,000
Account Month Status Limit Balance Payment Fin. Chrg ...
1234567 01/03 Open 10,000 1232.50 1232.50 0.00
2500000 07/02 Open 5,000 566.00 32.00 8.00
4098124 10/00 Open 6,000 3200.00 3200.00 0.00
1234567 02/03 Open 10,000 3000.00 3000.00 0.00
2500000 08/02 Open 5,000 600.00 40.00 9.23