SQL/MX Release 2.0 Best Practices
Database Sizing Considerations 21
Carefully determine the space requirements of summary tables. Often, summary tables are larger than
expected due to the density of the dimensional values when aggregated compared with the sparsity of
the dimensional values in the fact table.
If fact table updates are permitted, you must determine whether the summary tables should be updated
or rebuilt. Correctly updating the summary tables is challenging, while rebuilding the summary tables
requires many processing cycles.
Third-party productivity tools can be useful for creating and maintaining summary tables. These tools can
generate programs that create summary tables for initial and ongoing loads. These tools remove much of
the labor associated with creating a summary table.
Tables also can be maintained using publish and subscribe features. Analyze this aspect thoroughly to
determine if this solution fits your situation.
Considerations for Secondary Indexes
It is desirable to avoid indexing fact tables for a number of reasons. Secondary indexes on large tables
can require significant database space, and indexes are usually useful only when relatively few rows
match the predicate. Usually it is more efficient to perform a table scan.
When the table design is suitable, MDAM query technology often provides better performance than a
secondary index. However, secondary indexes do have their place. They are commonly used for large
dimension tables, to provide efficient, direct paths along predicate columns. Secondary indexes offer
several advantages: they are easy to build (no code development); they are maintained automatically
by the system (and in parallel when more than one exist on the same table); they are transparent to
queries; and they can be dropped quickly and easily.
These characteristics generally indicate that a secondary index is useful:
• Tables are large.
• Rows are added using the SQL/MX database INSERT statement.
• Access is by key-column, other than the leading primary key column or columns, or by a non-
indexed, data-column.
• Predicates produce a fairly low selectivity (where only a small portion of the rows would qualify).
• MDAM query technology is not used.
Performance also can be improved by ensuring index-only access, where columns are retrieved from
only the secondary index without needing to access the base table, saving I/Os. Include in the
secondary index additional non-key columns that are likely to be used with the indexed column. (Key
columns are always included in the secondary index.)
Partition secondary indexes to distribute the workload and data across multiple resources.
You cannot run SHOWLABEL during an index build; SHOWLABEL cannot locate the information from the
catalog while the index is being built. The error message does not explain that the table is unavailable;
the error message states only that the table entry cannot be found.
HP recommends that you create indexes after loading the data.
Data Locality Objectives
Data locality refers to the objective of keeping data and processing together, such as on the same
system. Very Large Database (VLDB) applications often require multiple interconnected systems with the
database distributed across all systems. When possible, run programs on the system where the data
resides. This practice is usually feasible when loading data or performing other batch activities, but more
difficult during query processing.
Collocation of the data in the keys and indexes is best. Collocation means keeping similar information to
be joined on the same drives. Collocation works both for range and hash partitioning. By using
collocation of the data, you decrease the amount of message traffic between disks for joins. If all the
same linked data is located on the same drive, the join can be isolated to that drive, without having to
message other drives for the join data.