SQL/MX Release 2.0 Best Practices

ManualsBrandsHP ManualsServerHP NonStop G-Series

Design Guidelines and Considerations 14

default, the clustering key is the primary key, but a separate clustering key can be specified with the

STORE BY clause.

When a primary key is specified for a table, the clustering key can comprise all or some of the columns of

the primary key. If only some of the primary key columns are used for clustering, you must use the leading

columns of the primary key in the STORE BY clause. When all or some of the primary key columns are used

in the STORE BY clause, the data is clustered by the primary key. In this situation, the primary key becomes

the clustering key.

When a primary key contains the DROPPABLE clause, the columns of the primary key can be updatable.

When the primary key is droppable, the columns in the STORE BY clause can be different from the primary

key columns. Data will be clustered by the STORE BY columns. In this situation, the primary key is not the

clustering key. To enforce uniqueness of the clustering key, a SYSKEY is appended to the end of the

clustering key. The b-tree organization of the table is based on the clustering key. The primary key is

maintained in a separate index, generated by NonStop SQL/MX. This generated index will be a single

partitioned index and will physically reside on the volume of the first partition for the table. Using the

MODIFY utility, you can alter the primary key index to move the location of this partition and add

additional partitions based on range partitioning. Use the SHOWDDL command to determine the name

of the SQL/MX generated primary key index.

NonStop SQL/MX supports two types of partitioning types: range partitioning and hash partitioning. Range

partitioning places rows into different disk volumes based on the range partitioning key. Hash partitioning

places rows into different disk volumes based on an algorithm of the hash key. The partitioning key must

be the clustering key or part of the clustering key. If the clustering key has a SYSKEY appended to it to

provide uniqueness when the clustering key is not the primary key, the SYSKEY portion is not used for

partitioning. In other words, the SYSKEY is dropped when partitioning occurs, since the partitioning key

does not need to be unique.

Rows are stored in each partition based on the order of the clustering key. Using the clustering key to

access data will provide the most efficient access method. Therefore, it is recommended that the

columns used as the clustering key should also be used as the primary key.

Primary-key Selection

The selection of a primary key is one of the most important design considerations. The proper key

structure gives the query an efficient access path, which enables good performance.

Primary keys for fact tables normally are multicolumn keys composed of foreign key relationships with

dimension tables. The combination of dimension column values makes each fact table row unique. Since

the primary key is multicolumn, you can select the column order, which can have a substantial impact on

query performance.

Generally, the primary-key order is determined by the data-access requirements. That is, query

requirements usually are given primary consideration, while load requirements are secondary. Columns

that restrict the search space the most, and are used often by queries, are good candidates for the high-

order column. When different queries access the data through different columns, you must determine

column placement based on additional considerations. You also might need to create a secondary

index on certain columns to avoid full-table scans, although, for many cases, MDAM query technology

eliminates this need. (Refer to the “MDAM query technology design considerations” section of this

document for more details.)

The principles of primary-key selection also apply to dimension tables. However, dimension tables usually

have fewer columns and require fewer considerations for their design. Primary keys for dimension tables

are usually composed of a single column, although sometimes multicolumn keys are used for tables that

have meaningful keys.

Listed below are the most important considerations for primary-key selection. Determine the priority of

these items based on performance needs:

• Data-access efficiency

• Cardinality

• Partitioning