SQL/MX 2.x Query Guide (G06.24+, H06.03+)

Keeping Statistics Current
HP NonStop SQL/MX Query Guide523728-003
3-5
Sampling and UPDATE STATISTICS
Analyzing the Possible Impact of Updating Statistics
Depending on the size of the table, updating statistics can take longer than you would
like. Consider updating statistics during the hours when peak performance is not
required.
If you want to preserve the existing query execution plan, be aware that updating
statistics might cause the optimizer to choose a different plan. Usually, you can
improve performance by updating the statistics on a table to reflect the current status.
The update statistics operations, however, might not improve performance, as
discussed next:
You can update statistics to perform a sampling of rows to determine the statistical
information. Depending on your sample size, this procedure could take a long time.
(That is, the larger the sample, the longer it might take.) Because this is a
statistical sampling method, the statistics gathered are not exact. Inaccurate
statistics (wrong by more than 10 percent) can adversely affect the plan. See
Sampling and UPDATE STATISTICS.
Because the UPDATE STATISTICS statement does not automatically recompile
programs, the operation does not invalidate the dependent programs. If you want
to take advantage of the new statistics, however, you must explicitly recompile the
dependent programs.
Sampling and UPDATE STATISTICS
Use sampling to control the amount of time spent calculating statistics. If you do not
specify sampling, statistics are collected by scanning the entire table. The optional
SAMPLE clause provides several methods of producing a sample set, based on a
ratio, to determine the histograms. The SQL/MX Reference Manual describes each
sampling method in detail.
You might want to use sampling because of the amount of time required to update
statistics on the entire table. Sampling techniques that do not perform full large table
scans result in significant performance gain. You can use SAMPLE without specifying
the number of rows. By default, SQL/MX samples two percent of the table up to a
maximum of two million rows. You might want to explicitly specify a larger sample size
to increase statistics accuracy or a smaller sample size to reduce the running time.
When you use the SAMPLE option with UPDATE STATISTICS, a temporary table is
created in the same volume as the primary partition of the table. Because the
temporary table is used for data gathering and calculation purposes, you must have
adequate disk space to accommodate it on your system.
When you use the sampling option, you can increase the efficiency by specifying a row
count for the columns you want to update. The next statement provides a way to obtain
the row count for the EMPNUM column of the EMPLOYEE table:
SELECT COUNT(*) FROM EMPLOYEE;