SQL/MX 2.x Reference Manual (H06.04+)

SQL/MX Clauses
HP NonStop SQL/MX Reference Manual540440-003
7-10
Considerations for SAMPLE
You can determine the actual size of the sample. Suppose that there are N
rows in the intermediate result table. If the size s of the sample is specified as
a number of rows, the actual size of the resulting sample is the minimum of s
and N.
PERIODIC rows-size EVERY number-rows ROWS
[SORT BY colname [ASC[ENDING] | DESC[ENDING]]
[,colname [ASC[ENDING] | DESC[ENDING]]]...]
directs NonStop SQL/MX to choose the first rows from each block (or period) of
contiguous rows. This sampling method is equivalent to a separate FIRST
sampling for each period, and the rows-size is defined as in FIRST sampling.
The size of the period is specified as a number of rows. You can specify the order
of the rows to sample. Otherwise, NonStop SQL/MX chooses an arbitrary order.
You can determine the actual size of the sample. Suppose that there are N rows in
the intermediate result table. If the size s of the sample is specified as a number of
rows and the size p of the period is specified as a number of rows, the actual size
of the resulting sample is calculated as:
FLOOR (N/p) * s + minimum (MOD (N, p), s)
Note that minimum in this expression is used simply as the mathematical minimum
of two values.
Considerations for SAMPLE
Sample Rows
In general, when you use the SAMPLE clause, the same query returns different sets of
rows for each execution. The same set of rows is returned only when you use the
FIRST and PERIODIC sampling methods with the SORT BY option, where there are
no duplicates in the specified column combination for the sort.
Cluster Sampling
Cluster sampling is an option supported by the SAMPLE RANDOM clause in a
SELECT statement. A cluster, in this sense, is a logically contiguous set of disk blocks
in the file in which a table is stored. The number of blocks in a cluster is specified in the
CLUSTERS subclause of the SAMPLE RANDOM clause. For example:
SELECT * FROM customers
SAMPLE RANDOM 1 PERCENT
CLUSTERS OF 2 BLOCKS;
This query randomly selects one percent of the clusters in the CUSTOMERS table and
then adds each row in all selected clusters to the result table. In other words, think of
the CUSTOMERS table as a sequence of disk blocks, where each two blocks in the
sequence is a cluster. The preceding query selects one percent of the clusters at
random and then returns all the rows in each selected cluster.