SQL/MX Data Mining Guide

Preparing the Data
HP NonStop SQL/MX Data Mining Guide523737-001
2-3
Cardinalities and Metrics
Cardinalities and Metrics
For any attribute, one approach to profiling is to run a separate query for each attribute.
As an example, consider the following queries, which profile the discrete attribute
Marital Status from the Customers table and the continuous attribute Balance from the
Account History table.
Example of Discrete Attribute
This query finds the number of discrete values of the Marital Status column of the
Customers table:
SELECT marital_status, COUNT(*)
FROM customers
GROUP BY marital_status;
Example of Continuous Attribute
This query computes statistical information about the continuous attribute Balance in
the Account History table:
SELECT MIN(balance), MAX(balance),
AVG(balance), VARIANCE(balance)
FROM acct_history;
Transposition
Other than the computation of a few metrics, both of the previous queries require a
complete scan of the data. In this way, a table with N attributes requires N queries,
resulting in the same number of complete scans. For a wide mining table, this
procedure can result in thousands of queries and scans of the data.
Using transposition, SQL/MX can perform the above profiling operations by using a
total of only two queries, regardless of the number of attributes to be profiled. Through
the TRANSPOSE clause of the SELECT statement, different columns of a source table
can be treated as a single output column, enabling similar computations to be
performed on all such source columns.
TRANSPOSE takes each row in the source table and converts each expression listed
in the transpose set to an individual output row. Used in this way, TRANSPOSE can
compute frequency counts for all discrete attributes in a table in a single query.
See the TRANSPOSE Clause entry in the SQL/MX Reference Manual for more
information.
Example of Computing Counts for Character Discrete Attributes
This query computes the frequency counts for the discrete attributes Gender, Marital
Status, and Home, which are all type character:
SET NAMETYPE ANSI;
SET SCHEMA mining.whse;