SQL/MX Data Mining Guide
Preparing the Data
HP NonStop SQL/MX Data Mining Guide—523737-001
2-5
Quick Profiling
NUMBER_CHILDREN ? 2 10
NUMBER_CHILDREN ? 3 3
--- 12 row(s) selected.
Because this query produces counts for four different attributes, use the ATTR column 
to distinguish from which attribute the values are drawn. The C1 column contains the 
values for the character attributes, and the C2 column contains the values for the 
numeric attribute.
Example of Computing Statistics for Continuous Attributes
Similarly, a single query using TRANSPOSE can compute the necessary statistics for 
all continuous attributes. The next query computes the minimum, maximum, mean, and 
variance for the continuous attributes Customer Credit Limit and Balance, which are 
both numeric:
SELECT attr, MIN(c1), MAX(c1), AVG(c1), VARIANCE(c1)
FROM acct_history 
TRANSPOSE (1,cust_limit), (2,balance) AS (attr, c1)
GROUP BY attr 
ORDER BY attr;
Sample results are: 
Sample results are: 
By using TRANSPOSE to compute attribute profiles, you gain performance and 
scalability advantages. Performance is improved because the data set is scanned only 
once. In addition, the number of queries is reduced to two: one for discrete attributes 
and one for continuous attributes. Scalability is enhanced because the amount of data 
accessed grows linearly with the number of attributes actually profiled.
Quick Profiling
The profiling step is highly iterative, because many different data sources are 
inspected and evaluated for possible analysis. Getting a quick impression of an 
attribute before proceeding to a more detailed profile is often necessary. For example, 
by quickly estimating cardinality, you can determine whether to treat a column as 
discrete or continuous. You can make a determination accurately without a scan of 
every single data element.
Use the SQL/MX sampling feature to: 
ATTR MIN(C1) MAX(C1) AVG(C1) VARIANCE(C1)
1 5000.00 40000.00 18225.81 2.01E+008
2  .00 32000.00  2539.12 1.46E+007
ATTR MIN(C1) MAX(C1) AVG(C1) VARIANCE(C1)
1 5000.00 40000.00  20139.86  2.35E+008
2  .00 32000.00  2444.17  1.584E+007










