SQL/MX 2.x Reference Manual (H06.10+, J06.03+)
SQL/MX Clauses
HP NonStop SQL/MX Reference Manual—544517-008
7-11
Examples of SAMPLE
in the file in which a table is stored. The number of blocks in a cluster is specified in the
CLUSTERS subclause of the SAMPLE RANDOM clause. For example:
SELECT * FROM customers
SAMPLE RANDOM 1 PERCENT
CLUSTERS OF 2 BLOCKS;
This query randomly selects one percent of the clusters in the CUSTOMERS table and
then adds each row in all selected clusters to the result table. In other words, think of
the CUSTOMERS table as a sequence of disk blocks, where each two blocks in the
sequence is a cluster. The preceding query selects one percent of the clusters at
random and then returns all the rows in each selected cluster.
Cluster sampling can be done only on a base table, not on intermediate results.
Cluster sampling is generally faster than sampling individual rows because fewer
blocks are read from disk. In random row and periodic sampling, the entire result table
being sampled is read, and each row in the table is processed. In cluster sampling,
only the disk blocks that are part of the result table are read and processed. Therefore,
if the sampling percentage is small, the performance advantage of cluster sampling
over other sampling methods can be dramatic.
Cluster sampling is designed for large tables. It might return zero rows if there are not
enough blocks in a table to fill at least one cluster and you specify a large cluster size.
This can also happen with a partitioned table if each partition does not have enough
blocks to fill at least one cluster. For example, if a table uses 1000 blocks and is
distributed over 256 partitions, there will be an average of 4 blocks per partition. If you
specify a SAMPLE RANDOM clause with a cluster size of 25 blocks and ten percent,
even if there are 10,000 rows in the table, sampling will result in the SELECT
statement returning 0 rows. To avoid this, use a smaller CLUSTER size.
For more information, see the SQL/MX Query Guide.
Examples of SAMPLE
Within SQLCI, suppose that the data-mining tables SALESPER, SALES, and DEPT
have been created as:
CREATE TABLE $db.mining.salesper
( empid NUMERIC (4) UNSIGNED NOT NULL
,dnum NUMERIC (4) UNSIGNED NOT NULL
,salary NUMERIC (8,2) UNSIGNED
,age INTEGER
,sex CHAR (6)
,PRIMARY KEY (empid) );
CREATE TABLE $db.mining.sales
( empid NUMERIC (4) UNSIGNED NOT NULL
,product VARCHAR (20)
,region CHAR (4)










