SQL/MX 2.x Reference Manual (H06.04+)

SQL/MX Clauses
HP NonStop SQL/MX Reference Manual540440-003
7-9
SAMPLE Clause
You can determine the actual size of the sample. Suppose that there are N
rows in the intermediate result table. Each row is picked with a probability of
r%, where r is the sample size in PERCENT. Therefore, the actual size of the
resulting sample is approximately r% of N. The number of rows picked follows
a binomial distribution with mean equal to r * N/100.
If you specify a sample size greater than 100 PERCENT, NonStop SQL/MX
returns all the rows in the result table plus duplicate rows. The duplicate rows
are picked from the result table according to the specified sampling method.
This technique is referred to as oversampling and is not allowed with cluster
sampling.
ROWS
specifies row sampling. Row sampling is the default if you specify neither
ROWS nor CLUSTERS.
CLUSTERS OF number-blocks BLOCKS
specifies cluster sampling. You can use the CLUSTERS clause for a base
table only if there is no WHERE clause in the SELECT statement. First, a
cluster is chosen randomly, and then all rows in the selected cluster are
added to the sample. The size of the cluster is determined by number-
blocks. This process is repeated until the sample size is generated. See
Cluster Sampling on page 7-10.
BALANCE
If you specify a BALANCE expression, NonStop SQL/MX performs
stratified sampling. The intermediate result table is divided into disjoint
strata based on the WHEN conditions. Each stratum is sampled
independently by using the sampling size. For a given row, the stratum to
which it belongs is determined by the first WHEN condition that is true for
that row—if there is a true condition. If there is no true condition, the row
belongs to the ELSE stratum.
FIRST rows-size [SORT BY colname [ASC[ENDING] | DESC[ENDING]]
[,colname [ASC[ENDING] | DESC[ENDING]]]...]
directs NonStop SQL/MX to choose the first rows from the result table. You can
specify the order of the rows to sample. Otherwise, NonStop SQL/MX chooses an
arbitrary order. The sampling size is determined by the rows-size, defined as:
number-rows ROWS
| BALANCE WHEN condition THEN number-rows ROWS
[WHEN condition THEN number-rows ROWS]...
[ELSE number-rows ROWS] END
specifies the value of the size for FIRST sampling by using the number of rows
intended in the sample. The value number-rows must be an integer literal.