SQL/MX Release 2.0 Best Practices
Design Guidelines and Considerations 16
particular high-cardinality column will always be included in a query predicate, use that column as part
of the primary key because that column will immediately restrict the query search space.
Be aware of the following situation associated with low-cardinality columns. Consider a case where the
primary key is composed of a very low cardinality column (for example, 1 or 2) that is always specified in
the query, followed by a column that has very high cardinality that is rarely included in a query
predicate. Even though the query always includes a predicate on the leading column, the effect is similar
to a primary key that omits the leading low-cardinality column. This is because the predicate on the low-
cardinality column does little to restrict the search space. A cardinality of 1 or 2 has a corresponding
selectivity of 100 percent or 50 percent, respectively.
Partitioning
Partitioning is a method of distributing a large table over multiple volumes (and systems) that remain
transparent to application programs and end users. The partitioning key corresponds to the leading
primary key columns and groups rows beginning with the named partition key value into the same disk
partition. This arrangement is also true for hash partitioning, which is derived from the value of the key.
Partitioning is an important consideration for achieving effective parallel-query and load processing.
Since partitioning depends on the chosen store-by key, partitioning and primary key selection decisions
must be coordinated for proper operation. See the “Fact table partitioning techniques” and “Dimension
table partitioning techniques” sections of this document for more details.
Design Techniques to Avoid
Joining Tables on Calculated Columns
Joins are usually made between the existing columns of two tables. A calculated column is one in which
an arithmetic calculation must be made to generate the column value prior to the join. Usually, this
calculation requires a full scan of the table, and either a sort or a repartition hash on the calculated
column value, even if a small row-set results from the join. (A repartition hash hashes the calculated
column value, writes the results to temporary workspace, then joins to the other table.) This operation is
very expensive and should be avoided, if possible.
Over-normalizing Tables
Some degree of normalization is acceptable in a database. However, avoid designs that rigidly enforce
normalization or are “over-normalized.” These structures are unlikely to perform well for queries, especially
when the tables involved are large and the expected number of rows accessed also is large.
In some cases you can implement the original normalized structures—which maintain complex
relationships—but extend the design by replicating data from selected tables into a single structure to
meet performance requirements. This arrangement can work well when the original structures are
characterized by low volatility and when the database supports mixed application requirements, such as
in ODS and DSS applications. Over-normalization can be very useful if a logical table has an “active” part
that is used in many queries, and a “non-active” part that is used in only a few queries. By splitting the
table in two, you get better query performance for those queries that select only the active set of
columns.
Load-only Optimized Design
Database projects often are characterized by tight schedules; implementation begins even though
query requirements are barely understood by the designers. This situation might be acceptable for proof-
of-concept efforts or for pilot projects, which often are experimental in nature. But it is a mistake to
proceed with a production implementation without understanding how the queries will access the data.
To construct the database, the designers must make choices about primary keys, key-column
placement, partitioning, and indexing. Invariably, tight schedules result in suboptimal design decisions,
causing either the users to have to accept slower query-response times or the database to be rebuilt or
significantly modified.