SQL/MX Data Mining Guide

Introduction

HP NonStop SQL/MX Data Mining Guide—523737-001

1-8

Preparing the Data

You obtain the cardinality of an attribute, which is the count of the number of unique

values for the attribute, by using a COUNT DISTINCT query. For example:

SELECT COUNT(DISTINCT Age)

FROM Customers;

SELECT COUNT(DISTINCT Number_Children)

FROM Customers;

Instead of having to submit a query for each attribute, you can obtain counts for

multiple attributes of a table by using the TRANSPOSE clause. For example:

SET NAMETYPE ANSI;

SET SCHEMA dmcat.whse;

SELECT ColumnIndex, COUNT(DISTINCT ColumnValue)

FROM Customers

TRANSPOSE Age, Number_Children AS ColumnValue

KEY BY ColumnIndex

GROUP BY ColumnIndex;

COLUMNINDEX (EXPR)

----------- --------------------

1 17

2 4

--- 2 row(s) selected.

The first row of the result table of the TRANSPOSE clause contains the distinct count

for the column Age, and the second row contains the distinct count for the column

Number_Children. You can treat the Age values as categories, consisting of age

ranges. Similarly, if Number_Children is greater than five, you can put the count into

the category for the Number_Children equal to five.

The number of attributes in a TRANSPOSE clause is unlimited.

For further information, see Profiling the Data on page 2-2.

Defining Events

In the scenario considered in this manual, the relevant event is the account holder

leaving. This event occurs at different points in time for customers that leave and not at

all for customers that stay.

This event must be defined so that account status and activity in the months leading up

to a customer leaving can be located and aligned in columns. For example, suppose

you create three derived attributes that describe the account balance for each of the

Note. The data types of attributes to be transformed into a single column must be

compatible. The data type of the result column is the union compatible data type of the

attributes.