SQL/MX Release 2.0 Best Practices

ManualsBrandsHP ManualsServerHP NonStop G-Series

Performance Tuning 47

Queries that frequently read large tables and aggregate the results to a high level will benefit from the

creation of summary table or tables. The “Design guidelines and considerations” section of this document

includes additional considerations for summary tables.

Joins between large tables can be quite expensive, in terms of both processing time and resources. This

issue is often one of the most significant issues faced by customers in the database environment. When

possible, use similar partitioning and primary-key structures, which enable efficient partition-to-partition

joins.

Database applications usually have at least one large dimension table that must be joined to the fact

table. In this situation:

• Ensure that the dimension table is indexed as needed to avoid full table scans.

• Ensure that the statistics are updated and correct.

• Deal with skewed data distributions as described in the “Addressing problems with skewed data

distributions” section of this document.

• Note whether the join column or columns between the dimension table and the fact table represent

an efficient path.

• Review DISPLAY_EXPLAIN output to ensure that an efficient join method exists to the fact table. If the

DISPLAY_EXPLAIN indicates that the type of join is a repartitioned hybrid hash join, the join will be a

costly operation and response time may be excessive. Repartitioned hash joins are used when the

join columns are not in the same order. It is preferable to use a sort-merge join, but it is still a costly

operation, especially when run concurrently by multiple queries.

If none of these methods improves the access plan, you may need to restructure one or both tables,

create an index on the fact table, or create an aggregate table, if reasonable, to improve performance.

Of course, review the query objectives to determine whether a more efficient query can be developed

prior to making any database changes.

Another common performance problem involving two large tables is one where a join is made between

a large dimension table and a fact table through a higher-level dimension table. Consider this situation: A

customer dimension table has a column called “customer_demographics,” which is further described in a

small dimension table called “demographics.” The customer table has a normal foreign-key relationship

to the fact table. A user submits a query that sums a fact column but reports the results by the

demographics description. In this case, a large join is required between the customer and fact tables, in

order to relate the demographics information to the fact table data.

Query performance could be improved greatly by treating customer_demographics as a separate

dimension on which fact data can be analyzed. The join involves a small table, demographics, and the

fact table, and provides much better performance than in the preceding case.

Look for opportunities that limit the joins between large tables.

Addressing Problems in SQL Catalog Performance

These measures will ensure the most efficient access to the catalog and schema tables.

• To avoid I/O contention during query compilation, customers who expect a high degree of

concurrent query execution should plan to keep the SQL/MX database catalog and schema on a

disk volume separate from the ODBC/MP Server catalog. (Note that NonStop ODBC/MX does not

have a separate catalog; it uses the SQL/MX catalog.) Do not share a catalog volume with active

data tables or with any files that are likely to experience high levels of I/O activity.

• Maintain separate catalog and schema for test and production purposes. Delete catalog and

schema entries when they are no longer required. Keep production catalog and schema free of

unnecessary entries.

• Keep catalog and schema fragmentation to a minimum by regularly performing online reloads. Do

online reloads after the initial creation of the database, and after tables or partitions have been

added and dropped.

• Keep catalog and schema statistics current for all catalog and schema tables. Update statistics

regularly, both after the initial creation of the database and after tables or partitions have been

added and dropped.