SQL/MX Data Mining Guide
Introduction
HP NonStop SQL/MX Data Mining Guide—523737-001
1-2
The SQL/MX Approach
algorithms that load the entire data set into memory and perform necessary
computations.
The extract approach has two major limitations:
•
It does not scale to large data sets because the entire data set is required to fit in
memory. Statistical sampling can be used to avoid this limitation. However,
sampling is inappropriate in many situations because sampling might cause
patterns to be missed, such as those in small groups or those between records.
•
It cannot conveniently manage multiple versions of data across numerous
iterations of a typical knowledge discovery investigation. For example, each
iteration might require extracting additional data, performing incremental updates,
deriving new attributes, and so on.
The SQL/MX Approach
In most enterprise organizations today, database systems are crucial for conducting
business. The DBMS systems serve as the transaction processing systems for daily
operations and manage data warehouses containing huge amounts of historical
information. The validated data in these warehouses is already being used for online
analysis and is a natural starting point for knowledge discovery.
The SQL/MX approach identifies fundamental data structures and operations that are
common across a wide range of knowledge discovery tasks and builds such structures
and operations into the DBMS. The primary advantages of the SQL/MX technology
over traditional data mining techniques include:
•
The ability to mine much larger data sets, not only data in flat-file extracts
•
Simplified data management
•
More complete results
•
Better performance and reduced cycle times
The main features of the SQL/MX approach are summarized next.
Data-Intensive Computations Performed in the DBMS
Tools and applications perform data-intensive data-preparation tasks in the DBMS by
using an SQL interface. As a result, you can access the powerful and parallel DBMS
data manipulation capabilities in the data preparation stage of the knowledge discovery
process.
Use of Built-In DBMS Data Structures and Operations
Fundamental data structures and operations are built into the DBMS to support a wide
range of knowledge discovery tasks and algorithms in an efficient and scalable
manner.