SQL/MX Data Mining Guide
Mining the Data
HP NonStop SQL/MX Data Mining Guide—523737-001
4-3
Building Decision Trees
GENDER M ? Y 6
MARITAL STATUS Divorced ? N 1
MARITAL STATUS Divorced ? Y 5
MARITAL STATUS Married ? N 2
MARITAL STATUS Married ? Y 1
MARITAL STATUS Single ? Y 3
MARITAL STATUS Widow ? N 1
MARITAL STATUS Widow ? Y 1
NUMBER_CHILDREN ? 0 N 2
NUMBER_CHILDREN ? 0 Y 5
NUMBER_CHILDREN ? 1 Y 2
NUMBER_CHILDREN ? 2 N 1
NUMBER_CHILDREN ? 2 Y 3
NUMBER_CHILDREN ? 3 N 1
--- 17 row(s) selected.
Determining Which Attribute Best Predicts the Goal
Consider the results of the preceding query. You are ready to determine which of the
independent variables best predicts the dependent variable (the goal).
Examine the rows for each independent variable in the query. If most of the rows for a
particular value of an independent variable correlate with Cust_Left equal to Y, that
independent variable is a good predictor of the goal. This type of analysis is typically
performed by client-mining tools.
Both Gender and Marital Status are reasonable choices as the best predictor of the
goal. To carry out the remaining cross-table generations, this scenario uses Marital
Status as the best predictor for the initial branch of the decision tree.
Independent
Variable
Predictor? Reason
GENDER Yes When Cust_Left equal to Y, the Gender is
predominantly equal to M. The number of
Males is 6, and the number of Females is 4.
MARITAL STATUS Yes When Cust_Left is equal to Y, the Marital
Status is predominantly equal to Divorced and
Single. The number of Divorced is 5, the
number of Married is 1, the number of Single
is 3, and the number of Widow is 1.
NUMBER CHILDREN No When Cust_Left is equal to Y, the
Number_Children is 0, 1, and 2. The number
with Children=0 is 5, the number with
Children=1 is 2, and the number with
Children=2 is 3. The values do not show a
pattern and do not predict Cust_Left equal to
Y.