SQL/MX Data Mining Guide

HP NonStop SQL/MX Data Mining Guide—523737-001

1-1

1 Introduction

Knowledge discovery is an iterative process involving many query-intensive steps. The

challenges of data management in supporting this process efficiently are significant

and continue to grow as knowledge discovery becomes more widely used.

Data mining identifies and characterizes interrelationships among multiple variables

without requiring a data analyst to formulate specific questions. Software tools look for

trends and patterns and flag unusual or potentially interesting ones. Because data

mining reveals previously unknown information and patterns, rather than proving or

disproving a hypothesis, mining enables knowledge discovery rather than just

knowledge verification.

Knowledge discovery is an iterative process involving many query-intensive steps. The

challenges of data management in supporting this process efficiently are significant

and continue to grow as knowledge discovery becomes more widely used.

This section discusses these approaches to data mining:

•

The Traditional Approach

Today, most data mining is performed in the database by using client tools. This

approach is limited because important information might be omitted from the data

extract.

•

The SQL/MX Approach

The SQL/MX approach to knowledge discovery enables you to perform many data

intensive tasks in the database itself, rather than on extracts. Examples include

statistical sampling, statistical functions, temporal reasoning through sequence

functions, cross-table generation, database profiling, and moving-window

aggregations.

•

The Knowledge Discovery Process

In the SQL/MX approach, fundamental data structures and operations are built into

the database management system (DBMS) to support a wide range of knowledge

discovery tasks and algorithms. The knowledge discovery process is described as

a series of steps that starts with the selection and definition of a business

opportunity, continues through data preparation and modeling, and ends with the

deployment of the new knowledge.

The Traditional Approach

Today’s traditional knowledge discovery systems consist of an application program on

top of a data source. The main emphasis in these systems is data mining—inventing

new techniques and algorithms, proving their statistical soundness, and validating their

effectiveness given a suitable problem.

Data should be available in a convenient form, typically a flat file, extracted from an

appropriate data source. The knowledge discovery system consists of specific