SQL/MX Data Mining Guide

HP NonStop SQL/MX Data Mining Guide523737-001
1-1
1 Introduction
Knowledge discovery is an iterative process involving many query-intensive steps. The
challenges of data management in supporting this process efficiently are significant
and continue to grow as knowledge discovery becomes more widely used.
Data mining identifies and characterizes interrelationships among multiple variables
without requiring a data analyst to formulate specific questions. Software tools look for
trends and patterns and flag unusual or potentially interesting ones. Because data
mining reveals previously unknown information and patterns, rather than proving or
disproving a hypothesis, mining enables knowledge discovery rather than just
knowledge verification.
Knowledge discovery is an iterative process involving many query-intensive steps. The
challenges of data management in supporting this process efficiently are significant
and continue to grow as knowledge discovery becomes more widely used.
This section discusses these approaches to data mining:
The Traditional Approach
Today, most data mining is performed in the database by using client tools. This
approach is limited because important information might be omitted from the data
extract.
The SQL/MX Approach
The SQL/MX approach to knowledge discovery enables you to perform many data
intensive tasks in the database itself, rather than on extracts. Examples include
statistical sampling, statistical functions, temporal reasoning through sequence
functions, cross-table generation, database profiling, and moving-window
aggregations.
The Knowledge Discovery Process
In the SQL/MX approach, fundamental data structures and operations are built into
the database management system (DBMS) to support a wide range of knowledge
discovery tasks and algorithms. The knowledge discovery process is described as
a series of steps that starts with the selection and definition of a business
opportunity, continues through data preparation and modeling, and ends with the
deployment of the new knowledge.
The Traditional Approach
Today’s traditional knowledge discovery systems consist of an application program on
top of a data source. The main emphasis in these systems is data mining—inventing
new techniques and algorithms, proving their statistical soundness, and validating their
effectiveness given a suitable problem.
Data should be available in a convenient form, typically a flat file, extracted from an
appropriate data source. The knowledge discovery system consists of specific