SQL/MX Data Mining Guide

Introduction

HP NonStop SQL/MX Data Mining Guide—523737-001

1-2

The SQL/MX Approach

algorithms that load the entire data set into memory and perform necessary

computations.

The extract approach has two major limitations:

•

It does not scale to large data sets because the entire data set is required to fit in

memory. Statistical sampling can be used to avoid this limitation. However,

sampling is inappropriate in many situations because sampling might cause

patterns to be missed, such as those in small groups or those between records.

•

It cannot conveniently manage multiple versions of data across numerous

iterations of a typical knowledge discovery investigation. For example, each

iteration might require extracting additional data, performing incremental updates,

deriving new attributes, and so on.

The SQL/MX Approach

In most enterprise organizations today, database systems are crucial for conducting

business. The DBMS systems serve as the transaction processing systems for daily

operations and manage data warehouses containing huge amounts of historical

information. The validated data in these warehouses is already being used for online

analysis and is a natural starting point for knowledge discovery.

The SQL/MX approach identifies fundamental data structures and operations that are

common across a wide range of knowledge discovery tasks and builds such structures

and operations into the DBMS. The primary advantages of the SQL/MX technology

over traditional data mining techniques include:

•

The ability to mine much larger data sets, not only data in flat-file extracts

•

Simplified data management

•

More complete results

•

Better performance and reduced cycle times

The main features of the SQL/MX approach are summarized next.

Data-Intensive Computations Performed in the DBMS

Tools and applications perform data-intensive data-preparation tasks in the DBMS by

using an SQL interface. As a result, you can access the powerful and parallel DBMS

data manipulation capabilities in the data preparation stage of the knowledge discovery

process.

Use of Built-In DBMS Data Structures and Operations

Fundamental data structures and operations are built into the DBMS to support a wide

range of knowledge discovery tasks and algorithms in an efficient and scalable

manner.