SQL/MX Data Mining Guide
Mining the Data
HP NonStop SQL/MX Data Mining Guide—523737-001
4-2
Building the Model
Building the Model
Typically, after the mining view is computed and inserted into a single database table, 
the data is extracted and loaded into a mining tool for the model building step. 
Regardless of the type of analysis to be performed, the mining data can be stored and 
retrieved by using the SQL/MX approach. This subsection describes how a decision 
tree can be used for data analysis.
Building Decision Trees
Decision trees are built by recursively partitioning the data in an increasingly selective 
manner, based on the attributes that most strongly determine the outcome. This 
classification is determined by computing the best splits at each node in the tree. The 
key operation of data is computing the frequency of various combinations of attributes 
for a given subset of the data. This result is called a cross table.
The first step in building a decision tree is to generate cross tables for all the attributes 
compared to the goal attribute. Building a decision tree can require the computation of 
tens of thousands of cross tables. The computation of each cross table requires 
scanning the data, applying specified predicates, grouping, and computing counts.
Computing Cross Tables
The first set of cross tables needed for building a decision tree consists of each 
independent variable (a potential predictor) paired with the dependent variable (the 
goal). In the same way that the profiling queries are combined by using TRANSPOSE, 
these separate cross-table queries can be combined into a single query for each node 
in the decision tree.
Computing Cross Tables to Determine the Initial Branch
This query computes the cross tables for Gender, Marital Status, and 
Number_Children, with Cust_Left as the dependent variable (the goal):
SET SCHEMA mining.whse;
SELECT Independent_Variable, IV1, IV2, cust_left, COUNT(*)
FROM miningview
TRANSPOSE ('GENDER', gender, NULL),
 ('MARITAL STATUS', marital_status, NULL),
 ('NUMBER_CHILDREN', NULL, number_children)
 AS (Independent_Variable, IV1, IV2) 
GROUP BY Independent_Variable, IV1, IV2, cust_left 
ORDER BY Independent_Variable, IV1, IV2, cust_left ;
INDEPENDENT_VARIABLE IV1 IV2 CUST_LEFT (EXPR) 
-------------------- -------- ------ --------- -------- 
GENDER F ? N 2 
GENDER F ? Y 4 
GENDER M ? N 2 










