SQL/MX Release 2.0 Best Practices

Possible Implementations 34

3. Determine the effect of the UPDATE STATISTICS statement, and, optionally, back out the generated

histogram, if necessary.

a. In MXCI, back up current histogram tables, if any:

• CREATE TABLE myhist LIKE HISTOGRAMS;

• INSERT INTO myhist SELECT * FROM HISTOGRAMS;

• CREATE TABLE myhistint LIKE HISTOGRAM_INTERVALS;

• INSERT INTO myhistint SELECT * FROM HISTOGRAM_INTERVALS;

b. Issue the UPDATE STATISTICS command for the required column groups.

c. Recompile the query.

d. Use EXPLAIN to review the cost information and determine if MDAM (if desired) is used for your

query. To determine if MDAM will be used, look at the results of EXPLAIN. MDAM occurs for the

FILE_SCAN operator. When key_type indicates MDAM for the FILE_SCAN operator, MDAM query

technology is used.

e. If necessary, use the UPDATE STATISTICS CLEAR option to remove histograms for unwanted

column groups.

f. If necessary, restore the backup histogram tables.

• DELETE FROM HISTOGRAMS;

• INSERT INTO HISTOGRAMS

SELECT * FROM myhist where table_uid in (select object_uid

from CAT.DEFINITION_SCHEMA_VERSION_1200.OBJECTS);

• DELETE FROM HISTOGRAM_INTERVALS;

• INSERT INTO HISTOGRAM_INTERVALS

SELECT * FROM myhistint where table_uid in (select object_uid

from CAT.DEFINITION_SCHEMA_VERSION_1200.OBJECTS);

Dimension Table Partitioning Techniques

If you plan to use parallel execution (note that some applications are more suited to serial execution),

plan to partition all dimension tables, even if they are small. This partitioning provides the optimizer with

more opportunities to select effective parallel plans. Small tables can easily be changed to non-

partitioned tables at a later time.

To take maximum advantage of parallel index updates, put a table’s indexes on separate volumes and

on separate CPUs to eliminate contention of parallel operations on indexes serviced by the same disk

process.

If possible, take advantage of any opportunity to partition large tables on the same columns. Joins

between these tables are more efficient than if the tables are unordered relative to one another.

Some customers have found it effective to use hash-partitioning techniques with dimension tables to

eliminate data-clustering patterns that reduce the effectiveness of parallel queries. For example, if all (or

many) of the predicate values for a particular column fall within one or two partitions, and this table is the

outer table, only one ESP (or just a few) will find qualifying rows. Only this ESP will continue running the

query, with diminished parallelism.

Managing Cache Memory Size

Cache is the buffer in memory that the disk process uses to keep copies of the disk blocks that have been

accessed most recently. If the disk process finds a requested block in cache, the disk process can satisfy

the request immediately without requesting a physical I/O operation.

Cache size has an important effect on performance. The larger the cache, the more likely that a block

needs to be read only once.