SQL/MX Release 2.0 Best Practices

ManualsBrandsHP ManualsServerHP NonStop G-Series

Database Sizing Considerations 21

Carefully determine the space requirements of summary tables. Often, summary tables are larger than

expected due to the density of the dimensional values when aggregated compared with the sparsity of

the dimensional values in the fact table.

If fact table updates are permitted, you must determine whether the summary tables should be updated

or rebuilt. Correctly updating the summary tables is challenging, while rebuilding the summary tables

requires many processing cycles.

Third-party productivity tools can be useful for creating and maintaining summary tables. These tools can

generate programs that create summary tables for initial and ongoing loads. These tools remove much of

the labor associated with creating a summary table.

Tables also can be maintained using publish and subscribe features. Analyze this aspect thoroughly to

determine if this solution fits your situation.

Considerations for Secondary Indexes

It is desirable to avoid indexing fact tables for a number of reasons. Secondary indexes on large tables

can require significant database space, and indexes are usually useful only when relatively few rows

match the predicate. Usually it is more efficient to perform a table scan.

When the table design is suitable, MDAM query technology often provides better performance than a

secondary index. However, secondary indexes do have their place. They are commonly used for large

dimension tables, to provide efficient, direct paths along predicate columns. Secondary indexes offer

several advantages: they are easy to build (no code development); they are maintained automatically

by the system (and in parallel when more than one exist on the same table); they are transparent to

queries; and they can be dropped quickly and easily.

These characteristics generally indicate that a secondary index is useful:

• Tables are large.

• Rows are added using the SQL/MX database INSERT statement.

• Access is by key-column, other than the leading primary key column or columns, or by a non-

indexed, data-column.

• Predicates produce a fairly low selectivity (where only a small portion of the rows would qualify).

• MDAM query technology is not used.

Performance also can be improved by ensuring index-only access, where columns are retrieved from

only the secondary index without needing to access the base table, saving I/Os. Include in the

secondary index additional non-key columns that are likely to be used with the indexed column. (Key

columns are always included in the secondary index.)

Partition secondary indexes to distribute the workload and data across multiple resources.

You cannot run SHOWLABEL during an index build; SHOWLABEL cannot locate the information from the

catalog while the index is being built. The error message does not explain that the table is unavailable;

the error message states only that the table entry cannot be found.

HP recommends that you create indexes after loading the data.

Data Locality Objectives

Data locality refers to the objective of keeping data and processing together, such as on the same

system. Very Large Database (VLDB) applications often require multiple interconnected systems with the

database distributed across all systems. When possible, run programs on the system where the data

resides. This practice is usually feasible when loading data or performing other batch activities, but more

difficult during query processing.

Collocation of the data in the keys and indexes is best. Collocation means keeping similar information to

be joined on the same drives. Collocation works both for range and hash partitioning. By using

collocation of the data, you decrease the amount of message traffic between disks for joins. If all the

same linked data is located on the same drive, the join can be isolated to that drive, without having to

message other drives for the join data.