SQL/MX 2.x Reference Manual (H06.10+, J06.03+)

ManualsBrandsHP ManualsServerHP Integrity NonStop J-Series

781

782

783

784

785

786

787

788

789

790

SQL/MX Clauses

HP NonStop SQL/MX Reference Manual—544517-008

7-11

Examples of SAMPLE

in the file in which a table is stored. The number of blocks in a cluster is specified in the

CLUSTERS subclause of the SAMPLE RANDOM clause. For example:

SELECT * FROM customers

SAMPLE RANDOM 1 PERCENT

CLUSTERS OF 2 BLOCKS;

This query randomly selects one percent of the clusters in the CUSTOMERS table and

then adds each row in all selected clusters to the result table. In other words, think of

the CUSTOMERS table as a sequence of disk blocks, where each two blocks in the

sequence is a cluster. The preceding query selects one percent of the clusters at

random and then returns all the rows in each selected cluster.

Cluster sampling can be done only on a base table, not on intermediate results.

Cluster sampling is generally faster than sampling individual rows because fewer

blocks are read from disk. In random row and periodic sampling, the entire result table

being sampled is read, and each row in the table is processed. In cluster sampling,

only the disk blocks that are part of the result table are read and processed. Therefore,

if the sampling percentage is small, the performance advantage of cluster sampling

over other sampling methods can be dramatic.

Cluster sampling is designed for large tables. It might return zero rows if there are not

enough blocks in a table to fill at least one cluster and you specify a large cluster size.

This can also happen with a partitioned table if each partition does not have enough

blocks to fill at least one cluster. For example, if a table uses 1000 blocks and is

distributed over 256 partitions, there will be an average of 4 blocks per partition. If you

specify a SAMPLE RANDOM clause with a cluster size of 25 blocks and ten percent,

even if there are 10,000 rows in the table, sampling will result in the SELECT

statement returning 0 rows. To avoid this, use a smaller CLUSTER size.

For more information, see the SQL/MX Query Guide.

Examples of SAMPLE

Within SQLCI, suppose that the data-mining tables SALESPER, SALES, and DEPT

have been created as:

CREATE TABLE $db.mining.salesper

( empid NUMERIC (4) UNSIGNED NOT NULL

,dnum NUMERIC (4) UNSIGNED NOT NULL

,salary NUMERIC (8,2) UNSIGNED

,age INTEGER

,sex CHAR (6)

,PRIMARY KEY (empid) );

CREATE TABLE $db.mining.sales

( empid NUMERIC (4) UNSIGNED NOT NULL

,product VARCHAR (20)

,region CHAR (4)