SQL/MX 2.x Query Guide (G06.24+, H06.03+)

Keeping Statistics Current
HP NonStop SQL/MX Query Guide523728-003
3-6
Performance Issues and Accuracy in Sampling
Use the value returned from running the SELECT COUNT (*) statement (125000) to
specify the valid row count in the UPDATE STATISTICS statement:
UPDATE STATISTICS FOR TABLE EMPLOYEE ON (EMPNUM)
SAMPLE 5000 ROWS SET ROWCOUNT 125000
Performance Issues and Accuracy in Sampling
Certain sampling options provide more accurate statistics than others, but higher
accuracy can mean performance trade-offs. When you need high accuracy or quality in
your sample, use random row sampling. In this method, the intermediate table being
sampled is scanned, and each row is selected with probability n/100, where n is the
sample percentage.
A faster alternative to random row sampling is cluster sampling. Use this method when
performance is a high priority, or when the quality of a sample is determined to be
good enough, based on the knowledge of the table for which statistics are being
updated.
In cluster sampling, only disk blocks that are actually part of the result set are read and
processed. If the sampling percentage is small, which is typically the case, the
performance advantage of cluster sampling over other sampling methods can be
dramatic. For example, consider two queries, one that selects a 1 PERCENT row
sample of a table and another that selects a 1 PERCENT cluster sample of the same
table. Because cluster sampling requires reading and processing only approximately 1
percent of the table, and row sampling reads and processes the entire table, cluster
sampling can be up to 100 times faster than random row sampling. However, the
trade-off for this performance is less accuracy than random row sampling provides.
Other sampling methods, such as PERIODIC, do not provide additional performance or
accuracy but provide semantics that are appropriate in certain situations. Use
PERIODIC sampling (for example, choosing 1 of every 10 rows) when it is important to
compute statistics from all parts of a table. For example, if a table is ordered by
timestamp and column values vary widely over time, the use of PERIODIC sampling
ensures that statistics are computed from rows spread evenly throughout the entire
table, resulting in accurate statistics.
These sampling options perform a full table scan prior to selecting the sample set:
SAMPLE RANDOM x PERCENT
SAMPLE PERIODIC x ROWS EVERY y ROWS