SQL/MX Data Mining Guide

Introduction
HP NonStop SQL/MX Data Mining Guide523737-001
1-8
Preparing the Data
You obtain the cardinality of an attribute, which is the count of the number of unique
values for the attribute, by using a COUNT DISTINCT query. For example:
SELECT COUNT(DISTINCT Age)
FROM Customers;
or
SELECT COUNT(DISTINCT Number_Children)
FROM Customers;
Instead of having to submit a query for each attribute, you can obtain counts for
multiple attributes of a table by using the TRANSPOSE clause. For example:
SET NAMETYPE ANSI;
SET SCHEMA dmcat.whse;
SELECT ColumnIndex, COUNT(DISTINCT ColumnValue)
FROM Customers
TRANSPOSE Age, Number_Children AS ColumnValue
KEY BY ColumnIndex
GROUP BY ColumnIndex;
COLUMNINDEX (EXPR)
----------- --------------------
1 17
2 4
--- 2 row(s) selected.
The first row of the result table of the TRANSPOSE clause contains the distinct count
for the column Age, and the second row contains the distinct count for the column
Number_Children. You can treat the Age values as categories, consisting of age
ranges. Similarly, if Number_Children is greater than five, you can put the count into
the category for the Number_Children equal to five.
The number of attributes in a TRANSPOSE clause is unlimited.
For further information, see Profiling the Data on page 2-2.
Defining Events
In the scenario considered in this manual, the relevant event is the account holder
leaving. This event occurs at different points in time for customers that leave and not at
all for customers that stay.
This event must be defined so that account status and activity in the months leading up
to a customer leaving can be located and aligned in columns. For example, suppose
you create three derived attributes that describe the account balance for each of the
Note. The data types of attributes to be transformed into a single column must be
compatible. The data type of the result column is the union compatible data type of the
attributes.