SQL/MX Release 2.0 Best Practices

Performance Tuning 47
Queries that frequently read large tables and aggregate the results to a high level will benefit from the
creation of summary table or tables. The “Design guidelines and considerations” section of this document
includes additional considerations for summary tables.
Joins between large tables can be quite expensive, in terms of both processing time and resources. This
issue is often one of the most significant issues faced by customers in the database environment. When
possible, use similar partitioning and primary-key structures, which enable efficient partition-to-partition
joins.
Database applications usually have at least one large dimension table that must be joined to the fact
table. In this situation:
Ensure that the dimension table is indexed as needed to avoid full table scans.
Ensure that the statistics are updated and correct.
Deal with skewed data distributions as described in the “Addressing problems with skewed data
distributions” section of this document.
Note whether the join column or columns between the dimension table and the fact table represent
an efficient path.
Review DISPLAY_EXPLAIN output to ensure that an efficient join method exists to the fact table. If the
DISPLAY_EXPLAIN indicates that the type of join is a repartitioned hybrid hash join, the join will be a
costly operation and response time may be excessive. Repartitioned hash joins are used when the
join columns are not in the same order. It is preferable to use a sort-merge join, but it is still a costly
operation, especially when run concurrently by multiple queries.
If none of these methods improves the access plan, you may need to restructure one or both tables,
create an index on the fact table, or create an aggregate table, if reasonable, to improve performance.
Of course, review the query objectives to determine whether a more efficient query can be developed
prior to making any database changes.
Another common performance problem involving two large tables is one where a join is made between
a large dimension table and a fact table through a higher-level dimension table. Consider this situation: A
customer dimension table has a column called “customer_demographics,” which is further described in a
small dimension table called “demographics.” The customer table has a normal foreign-key relationship
to the fact table. A user submits a query that sums a fact column but reports the results by the
demographics description. In this case, a large join is required between the customer and fact tables, in
order to relate the demographics information to the fact table data.
Query performance could be improved greatly by treating customer_demographics as a separate
dimension on which fact data can be analyzed. The join involves a small table, demographics, and the
fact table, and provides much better performance than in the preceding case.
Look for opportunities that limit the joins between large tables.
Addressing Problems in SQL Catalog Performance
These measures will ensure the most efficient access to the catalog and schema tables.
To avoid I/O contention during query compilation, customers who expect a high degree of
concurrent query execution should plan to keep the SQL/MX database catalog and schema on a
disk volume separate from the ODBC/MP Server catalog. (Note that NonStop ODBC/MX does not
have a separate catalog; it uses the SQL/MX catalog.) Do not share a catalog volume with active
data tables or with any files that are likely to experience high levels of I/O activity.
Maintain separate catalog and schema for test and production purposes. Delete catalog and
schema entries when they are no longer required. Keep production catalog and schema free of
unnecessary entries.
Keep catalog and schema fragmentation to a minimum by regularly performing online reloads. Do
online reloads after the initial creation of the database, and after tables or partitions have been
added and dropped.
Keep catalog and schema statistics current for all catalog and schema tables. Update statistics
regularly, both after the initial creation of the database and after tables or partitions have been
added and dropped.