SQL/MX Data Mining Guide
Introduction
HP NonStop SQL/MX Data Mining Guide—523737-001
1-3
The Knowledge Discovery Process
Building these data structures and operations into the DBMS allows mining tasks to be
moved into the SQL engine for tighter integration of data and mining operations and for
improved performance and scalability.
Adding new primitives, such as moving-window aggregate functions, simplifies queries
needed by knowledge discovery tools and applications. This type of query
simplification often results in significant improvements in performance.
The Knowledge Discovery Process
The knowledge discovery process is a nine-step process that starts with the selection
and definition of a business opportunity, continues through several data preparation
steps and a modeling step, and ends with the deployment of the new knowledge. This
subsection describes the first step of that process.
1. Identify and define a business opportunity.
The process begins with the identification and precise specification of a business
opportunity.
See Defining the Business Opportunity on page 1-4.
2. Preprocess and load the data for the business opportunity.
Real-world data is often inconsistent and incomplete. The first preparation step is
to address these problems by preprocessing the data in various ways—for
example, verifying and mapping the data. Then load the data into your database
system.
See Preparing the Data on page 1-7
3. Profile and understand the relevant data.
Generate a variety of statistics such as column unique entry counts, value ranges,
number of missing values, mean, variance, and so on.
See Profiling the Data
on page 1-7
4. Define events relevant to the business opportunity being explored.
Events are used to align related data in a single set of columns for mining.
Example events are life changes, such as getting married or switching jobs, or
customer actions, such as opening an account or requesting a credit limit increase.
See Defining Events
on page 1-8
5. Derive attributes.
For example, customer age can be derived from birth date. Account summary
statistics, such as maximum and minimum balances, can be derived from monthly
status information.
See Preparing the Data on page 1-7.