SQL/MX Data Mining Guide
Introduction
HP NonStop SQL/MX Data Mining Guide—523737-001
1-7
Preparing the Data
then a customer is defined as having left—that is, no longer holds a credit card 
account.
Preparing the Data
After a business opportunity has been identified and defined, the next task is to 
prepare a data set for mining. This is done in Steps 2 through 6 of the knowledge 
discovery process. See The Knowledge Discovery Process on page 1-3.
The first two steps are preprocessing the mining data to make it consistent and then 
loading the data into a database system. For further information, see Loading the Data 
on page 2-2.
The next step is to generate a variety of statistics—for example, column unique entry 
counts, value ranges, number of missing values, mean, variance, and so on. This type 
of data profile is helpful in gaining an understanding of the data, and this profile also 
serves as a valuable reference throughout the knowledge discovery process.
Profiling the Data
A profile of the database helps to solve the data mining problem in these ways: 
•
To better understand the data
•
To decide which columns to use for analysis
•
To decide whether to treat attributes as discrete or continuous
Types of Information
The type of information used to create a profile of the data mining view comes from the 
following elements:
•
Tables in the database
•
Table attributes (or columns to be used in the analysis)
•
Data types of the table attributes
•
Relationships between tables
•
Cardinalities of discrete attributes
•
Statistics about continuous attributes
•
Derived table attributes (or derived columns to be used in the analysis) 
Determining the derived columns to be constructed requires knowledge of the table 
attributes and how these attributes relate to the data mining problem. See Preparing 
the Data on page 1-7 for a full discussion of these elements.
SQL/MX provides the TRANSPOSE clause of the SELECT statement to display the 
cardinalities of discrete attributes. See Transposition on page 2-3 and the 
TRANSPOSE Clause entry in the SQL/MX Reference Manual for details.
Example of Finding Cardinality of Discrete Attributes
The customers table in your data set has Age and Number_Children columns. Both of 
these attributes are discrete, and you can compute the cardinality of each attribute.










