SQL/MX Data Mining Guide
Mining the Data
HP NonStop SQL/MX Data Mining Guide—523737-001
4-4
Building Decision Trees
Typically, the best discriminator of the goal is determined by a statistical analysis of the
cross tables. The exact nature of this analysis varies from tool to tool.
Initial Decision Tree
Figure 4-1 shows the initial decision tree for the business opportunity. Marital Status is
chosen as the best predictor of the goal with four initial branches—Divorced, Single,
Married, and Widow.
The model is built to characterize the customers that have left—that is, the model will
find the rows where Cust_Left is Y.
The results for Divorced and Single are the most promising for further development of
the decision tree. For Divorced, the number of records is 5 for Cust_Left equal to Y,
and for Single, the number of records is 3 for Cust_Left equal to Y. In both cases, the
results of the cross table show the best homogeneous split with respect to the goal.
Initial Branches of the Decision Tree
The two initial branches that seem most promising are defined by two conditions:
marital_status = 'Divorced'
marital_status = 'Single'
Computing Cross Tables When Marital Status Equal to Divorced
This query generates cross tables for all attributes, except Marital Status, compared to
the goal when Marital Status is equal to Divorced:
SELECT Independent_Variable, IV1, IV2, cust_left, COUNT(*)
FROM miningview
WHERE marital_status = 'Divorced'
TRANSPOSE ('GENDER', gender, NULL),
('NUMBER_CHILDREN', NULL, number_children)
AS (Independent_Variable, IV1, IV2)
GROUP BY Independent_Variable, IV1, IV2, cust_left
ORDER BY Independent_Variable, IV1, IV2, cust_left;
INDEPENDENT_VARIABLE IV1 IV2 CUST_LEFT (EXPR)
-------------------- --- ------ --------- --------
GENDER F ? N 1
Figure 4-1. Initial Branches of Decision Tree
Marital Status
Single
No Yes
0 3
Married
No Yes
2 1
Widow
No Yes
1 1
Divorced
No Yes
1 5