SQL/MX Data Mining Guide
Introduction
HP NonStop SQL/MX Data Mining Guide—523737-001
1-7
Preparing the Data
then a customer is defined as having left—that is, no longer holds a credit card
account.
Preparing the Data
After a business opportunity has been identified and defined, the next task is to
prepare a data set for mining. This is done in Steps 2 through 6 of the knowledge
discovery process. See The Knowledge Discovery Process on page 1-3.
The first two steps are preprocessing the mining data to make it consistent and then
loading the data into a database system. For further information, see Loading the Data
on page 2-2.
The next step is to generate a variety of statistics—for example, column unique entry
counts, value ranges, number of missing values, mean, variance, and so on. This type
of data profile is helpful in gaining an understanding of the data, and this profile also
serves as a valuable reference throughout the knowledge discovery process.
Profiling the Data
A profile of the database helps to solve the data mining problem in these ways:
•
To better understand the data
•
To decide which columns to use for analysis
•
To decide whether to treat attributes as discrete or continuous
Types of Information
The type of information used to create a profile of the data mining view comes from the
following elements:
•
Tables in the database
•
Table attributes (or columns to be used in the analysis)
•
Data types of the table attributes
•
Relationships between tables
•
Cardinalities of discrete attributes
•
Statistics about continuous attributes
•
Derived table attributes (or derived columns to be used in the analysis)
Determining the derived columns to be constructed requires knowledge of the table
attributes and how these attributes relate to the data mining problem. See Preparing
the Data on page 1-7 for a full discussion of these elements.
SQL/MX provides the TRANSPOSE clause of the SELECT statement to display the
cardinalities of discrete attributes. See Transposition on page 2-3 and the
TRANSPOSE Clause entry in the SQL/MX Reference Manual for details.
Example of Finding Cardinality of Discrete Attributes
The customers table in your data set has Age and Number_Children columns. Both of
these attributes are discrete, and you can compute the cardinality of each attribute.