SQL/MX 2.x Query Guide (G06.24+, H06.03+)
Keeping Statistics Current
HP NonStop SQL/MX Query Guide—523728-003
3-6
Performance Issues and Accuracy in Sampling
Use the value returned from running the SELECT COUNT (*) statement (125000) to 
specify the valid row count in the UPDATE STATISTICS statement:
UPDATE STATISTICS FOR TABLE EMPLOYEE ON (EMPNUM)
 SAMPLE 5000 ROWS SET ROWCOUNT 125000
Performance Issues and Accuracy in Sampling
Certain sampling options provide more accurate statistics than others, but higher 
accuracy can mean performance trade-offs. When you need high accuracy or quality in 
your sample, use random row sampling. In this method, the intermediate table being 
sampled is scanned, and each row is selected with probability n/100, where n is the 
sample percentage. 
A faster alternative to random row sampling is cluster sampling. Use this method when 
performance is a high priority, or when the quality of a sample is determined to be 
good enough, based on the knowledge of the table for which statistics are being 
updated. 
In cluster sampling, only disk blocks that are actually part of the result set are read and 
processed. If the sampling percentage is small, which is typically the case, the 
performance advantage of cluster sampling over other sampling methods can be 
dramatic. For example, consider two queries, one that selects a 1 PERCENT row 
sample of a table and another that selects a 1 PERCENT cluster sample of the same 
table. Because cluster sampling requires reading and processing only approximately 1 
percent of the table, and row sampling reads and processes the entire table, cluster 
sampling can be up to 100 times faster than random row sampling. However, the 
trade-off for this performance is less accuracy than random row sampling provides.
Other sampling methods, such as PERIODIC, do not provide additional performance or 
accuracy but provide semantics that are appropriate in certain situations. Use 
PERIODIC sampling (for example, choosing 1 of every 10 rows) when it is important to 
compute statistics from all parts of a table. For example, if a table is ordered by 
timestamp and column values vary widely over time, the use of PERIODIC sampling 
ensures that statistics are computed from rows spread evenly throughout the entire 
table, resulting in accurate statistics.
These sampling options perform a full table scan prior to selecting the sample set:
SAMPLE RANDOM x PERCENT
SAMPLE PERIODIC x ROWS EVERY y ROWS










