SQL/MX Data Mining Guide

Mining the Data
HP NonStop SQL/MX Data Mining Guide523737-001
4-2
Building the Model
Building the Model
Typically, after the mining view is computed and inserted into a single database table,
the data is extracted and loaded into a mining tool for the model building step.
Regardless of the type of analysis to be performed, the mining data can be stored and
retrieved by using the SQL/MX approach. This subsection describes how a decision
tree can be used for data analysis.
Building Decision Trees
Decision trees are built by recursively partitioning the data in an increasingly selective
manner, based on the attributes that most strongly determine the outcome. This
classification is determined by computing the best splits at each node in the tree. The
key operation of data is computing the frequency of various combinations of attributes
for a given subset of the data. This result is called a cross table.
The first step in building a decision tree is to generate cross tables for all the attributes
compared to the goal attribute. Building a decision tree can require the computation of
tens of thousands of cross tables. The computation of each cross table requires
scanning the data, applying specified predicates, grouping, and computing counts.
Computing Cross Tables
The first set of cross tables needed for building a decision tree consists of each
independent variable (a potential predictor) paired with the dependent variable (the
goal). In the same way that the profiling queries are combined by using TRANSPOSE,
these separate cross-table queries can be combined into a single query for each node
in the decision tree.
Computing Cross Tables to Determine the Initial Branch
This query computes the cross tables for Gender, Marital Status, and
Number_Children, with Cust_Left as the dependent variable (the goal):
SET SCHEMA mining.whse;
SELECT Independent_Variable, IV1, IV2, cust_left, COUNT(*)
FROM miningview
TRANSPOSE ('GENDER', gender, NULL),
('MARITAL STATUS', marital_status, NULL),
('NUMBER_CHILDREN', NULL, number_children)
AS (Independent_Variable, IV1, IV2)
GROUP BY Independent_Variable, IV1, IV2, cust_left
ORDER BY Independent_Variable, IV1, IV2, cust_left ;
INDEPENDENT_VARIABLE IV1 IV2 CUST_LEFT (EXPR)
-------------------- -------- ------ --------- --------
GENDER F ? N 2
GENDER F ? Y 4
GENDER M ? N 2