SQL/MX Release 2.0 Best Practices

Updating Statistics 42

This example runs the UPDATE STATISTICS statement on a large table:

UPDATE STATISTICS FOR TABLE tablename

SAMPLE RANDOM 10 PERCENT SET ROWCOUNT rowcount;

Using the UPDATE STATISTICS statement is a processor-intensive operation. To limit the impact on the

system, execute this statement with a low priority so the system can manage the mixed workloads.

By default, UPDATE STATISTICS ON EVERY KEY gathers statistics for all key columns, including indexed

columns. To gather statistics for all columns, use the ON EVERY COLUMN option.

Information to Aid “Implanting” SQL Statistics

1. Set catalog statistics.

2. Update the row count for each partition of the table.

3. If the root partition is empty, add an additional statement that sets the rowcount to 0 for that

partition only.

4. Assume that no blocks are empty. Estimate the NONEMPTYBLOCKCOUNT by dividing the EOF of the

partition by the block size. Note that EOF refers to the partition EOF.

Caution: The unique entry count (UEC) needs to fall within the range indicated by second low/high

values. Otherwise, the selectivity is based on the range and not on the UEC.

For large tables, INDEXLEVELS is usually not greater than 3 (or 4).

Indexes are not recognized by the compiler until UPDATE STATISTICS has been run for the corresponding

columns in the index. This action can be done by using the ON EVERY KEY statement.

Statistics are cumulative. For example, you can run statistics ON EVERY KEY for a table. Then, if you have a

query that would perform better with statistics on other additional columns, you can update statistics on

those column combinations. In a larger table, you should use sampling, statistics on keys, and rowcounts.

An average of 10 percent is a good average to use when sampling. However, note that when sampling

more than 50 million rows, using 1 percent also gives you a good sampling.

Use of the sample and random parameter aids in the performance of the UPDATE STATISTICS operation.

This use allows general information to be generated about the data content, without having to read

every row.

You must know and use the rowcount. If you are unaware of the rowcount of the table, run SELECT

COUNT(*) on the table prior to generating statistics. UPDATE STATISTICS uses certain algorithms to

determine sampling for the table, and one of those parameters is rowcount. If the UPDATE STATISTICS

operation does not have the specified rowcount, it makes default assumptions that might not perform

well for the amount of data.