SQL/MX Release 2.0 Best Practices

Updating Statistics 42
This example runs the UPDATE STATISTICS statement on a large table:
UPDATE STATISTICS FOR TABLE tablename
SAMPLE RANDOM 10 PERCENT SET ROWCOUNT rowcount;
Using the UPDATE STATISTICS statement is a processor-intensive operation. To limit the impact on the
system, execute this statement with a low priority so the system can manage the mixed workloads.
By default, UPDATE STATISTICS ON EVERY KEY gathers statistics for all key columns, including indexed
columns. To gather statistics for all columns, use the ON EVERY COLUMN option.
Information to Aid “Implanting” SQL Statistics
1. Set catalog statistics.
2. Update the row count for each partition of the table.
3. If the root partition is empty, add an additional statement that sets the rowcount to 0 for that
partition only.
4. Assume that no blocks are empty. Estimate the NONEMPTYBLOCKCOUNT by dividing the EOF of the
partition by the block size. Note that EOF refers to the partition EOF.
Caution: The unique entry count (UEC) needs to fall within the range indicated by second low/high
values. Otherwise, the selectivity is based on the range and not on the UEC.
For large tables, INDEXLEVELS is usually not greater than 3 (or 4).
Indexes are not recognized by the compiler until UPDATE STATISTICS has been run for the corresponding
columns in the index. This action can be done by using the ON EVERY KEY statement.
Statistics are cumulative. For example, you can run statistics ON EVERY KEY for a table. Then, if you have a
query that would perform better with statistics on other additional columns, you can update statistics on
those column combinations. In a larger table, you should use sampling, statistics on keys, and rowcounts.
An average of 10 percent is a good average to use when sampling. However, note that when sampling
more than 50 million rows, using 1 percent also gives you a good sampling.
Use of the sample and random parameter aids in the performance of the UPDATE STATISTICS operation.
This use allows general information to be generated about the data content, without having to read
every row.
You must know and use the rowcount. If you are unaware of the rowcount of the table, run SELECT
COUNT(*) on the table prior to generating statistics. UPDATE STATISTICS uses certain algorithms to
determine sampling for the table, and one of those parameters is rowcount. If the UPDATE STATISTICS
operation does not have the specified rowcount, it makes default assumptions that might not perform
well for the amount of data.