SQL/MX Release 2.0 Best Practices
Design Guidelines and Considerations  16 
particular high-cardinality column will always be included in a query predicate, use that column as part 
of the primary key because that column will immediately restrict the query search space. 
Be aware of the following situation associated with low-cardinality columns. Consider a case where the 
primary key is composed of a very low cardinality column (for example, 1 or 2) that is always specified in 
the query, followed by a column that has very high cardinality that is rarely included in a query 
predicate. Even though the query always includes a predicate on the leading column, the effect is similar 
to a primary key that omits the leading low-cardinality column. This is because the predicate on the low-
cardinality column does little to restrict the search space. A cardinality of 1 or 2 has a corresponding 
selectivity of 100 percent or 50 percent, respectively.  
Partitioning 
Partitioning is a method of distributing a large table over multiple volumes (and systems) that remain 
transparent to application programs and end users. The partitioning key corresponds to the leading 
primary key columns and groups rows beginning with the named partition key value into the same disk 
partition. This arrangement is also true for hash partitioning, which is derived from the value of the key. 
Partitioning is an important consideration for achieving effective parallel-query and load processing. 
Since partitioning depends on the chosen store-by key, partitioning and primary key selection decisions 
must be coordinated for proper operation. See the “Fact table partitioning techniques” and “Dimension 
table partitioning techniques” sections of this document for more details. 
Design Techniques to Avoid 
Joining Tables on Calculated Columns  
Joins are usually made between the existing columns of two tables. A calculated column is one in which 
an arithmetic calculation must be made to generate the column value prior to the join. Usually, this 
calculation requires a full scan of the table, and either a sort or a repartition hash on the calculated 
column value, even if a small row-set results from the join. (A repartition hash hashes the calculated 
column value, writes the results to temporary workspace, then joins to the other table.) This operation is 
very expensive and should be avoided, if possible. 
Over-normalizing Tables 
Some degree of normalization is acceptable in a database. However, avoid designs that rigidly enforce 
normalization or are “over-normalized.” These structures are unlikely to perform well for queries, especially 
when the tables involved are large and the expected number of rows accessed also is large.  
In some cases you can implement the original normalized structures—which maintain complex 
relationships—but extend the design by replicating data from selected tables into a single structure to 
meet performance requirements. This arrangement can work well when the original structures are 
characterized by low volatility and when the database supports mixed application requirements, such as 
in ODS and DSS applications. Over-normalization can be very useful if a logical table has an “active” part 
that is used in many queries, and a “non-active” part that is used in only a few queries. By splitting the 
table in two, you get better query performance for those queries that select only the active set of 
columns. 
Load-only Optimized Design 
Database projects often are characterized by tight schedules; implementation begins even though 
query requirements are barely understood by the designers. This situation might be acceptable for proof-
of-concept efforts or for pilot projects, which often are experimental in nature. But it is a mistake to 
proceed with a production implementation without understanding how the queries will access the data.  
To construct the database, the designers must make choices about primary keys, key-column 
placement, partitioning, and indexing. Invariably, tight schedules result in suboptimal design decisions, 
causing either the users to have to accept slower query-response times or the database to be rebuilt or 
significantly modified. 










