SQL/MX Release 2.0 Best Practices
Design Guidelines and Considerations 14
default, the clustering key is the primary key, but a separate clustering key can be specified with the
STORE BY clause.
When a primary key is specified for a table, the clustering key can comprise all or some of the columns of
the primary key. If only some of the primary key columns are used for clustering, you must use the leading
columns of the primary key in the STORE BY clause. When all or some of the primary key columns are used
in the STORE BY clause, the data is clustered by the primary key. In this situation, the primary key becomes
the clustering key.
When a primary key contains the DROPPABLE clause, the columns of the primary key can be updatable.
When the primary key is droppable, the columns in the STORE BY clause can be different from the primary
key columns. Data will be clustered by the STORE BY columns. In this situation, the primary key is not the
clustering key. To enforce uniqueness of the clustering key, a SYSKEY is appended to the end of the
clustering key. The b-tree organization of the table is based on the clustering key. The primary key is
maintained in a separate index, generated by NonStop SQL/MX. This generated index will be a single
partitioned index and will physically reside on the volume of the first partition for the table. Using the
MODIFY utility, you can alter the primary key index to move the location of this partition and add
additional partitions based on range partitioning. Use the SHOWDDL command to determine the name
of the SQL/MX generated primary key index.
NonStop SQL/MX supports two types of partitioning types: range partitioning and hash partitioning. Range
partitioning places rows into different disk volumes based on the range partitioning key. Hash partitioning
places rows into different disk volumes based on an algorithm of the hash key. The partitioning key must
be the clustering key or part of the clustering key. If the clustering key has a SYSKEY appended to it to
provide uniqueness when the clustering key is not the primary key, the SYSKEY portion is not used for
partitioning. In other words, the SYSKEY is dropped when partitioning occurs, since the partitioning key
does not need to be unique.
Rows are stored in each partition based on the order of the clustering key. Using the clustering key to
access data will provide the most efficient access method. Therefore, it is recommended that the
columns used as the clustering key should also be used as the primary key.
Primary-key Selection
The selection of a primary key is one of the most important design considerations. The proper key
structure gives the query an efficient access path, which enables good performance.
Primary keys for fact tables normally are multicolumn keys composed of foreign key relationships with
dimension tables. The combination of dimension column values makes each fact table row unique. Since
the primary key is multicolumn, you can select the column order, which can have a substantial impact on
query performance.
Generally, the primary-key order is determined by the data-access requirements. That is, query
requirements usually are given primary consideration, while load requirements are secondary. Columns
that restrict the search space the most, and are used often by queries, are good candidates for the high-
order column. When different queries access the data through different columns, you must determine
column placement based on additional considerations. You also might need to create a secondary
index on certain columns to avoid full-table scans, although, for many cases, MDAM query technology
eliminates this need. (Refer to the “MDAM query technology design considerations” section of this
document for more details.)
The principles of primary-key selection also apply to dimension tables. However, dimension tables usually
have fewer columns and require fewer considerations for their design. Primary keys for dimension tables
are usually composed of a single column, although sometimes multicolumn keys are used for tables that
have meaningful keys.
Listed below are the most important considerations for primary-key selection. Determine the priority of
these items based on performance needs:
• Data-access efficiency
• Cardinality
• Partitioning