SQL/MX Release 2.0 Best Practices

ManualsBrandsHP ManualsServerHP NonStop G-Series

Design Guidelines and Considerations 16

particular high-cardinality column will always be included in a query predicate, use that column as part

of the primary key because that column will immediately restrict the query search space.

Be aware of the following situation associated with low-cardinality columns. Consider a case where the

primary key is composed of a very low cardinality column (for example, 1 or 2) that is always specified in

the query, followed by a column that has very high cardinality that is rarely included in a query

predicate. Even though the query always includes a predicate on the leading column, the effect is similar

to a primary key that omits the leading low-cardinality column. This is because the predicate on the low-

cardinality column does little to restrict the search space. A cardinality of 1 or 2 has a corresponding

selectivity of 100 percent or 50 percent, respectively.

Partitioning

Partitioning is a method of distributing a large table over multiple volumes (and systems) that remain

transparent to application programs and end users. The partitioning key corresponds to the leading

primary key columns and groups rows beginning with the named partition key value into the same disk

partition. This arrangement is also true for hash partitioning, which is derived from the value of the key.

Partitioning is an important consideration for achieving effective parallel-query and load processing.

Since partitioning depends on the chosen store-by key, partitioning and primary key selection decisions

must be coordinated for proper operation. See the “Fact table partitioning techniques” and “Dimension

table partitioning techniques” sections of this document for more details.

Design Techniques to Avoid

Joining Tables on Calculated Columns

Joins are usually made between the existing columns of two tables. A calculated column is one in which

an arithmetic calculation must be made to generate the column value prior to the join. Usually, this

calculation requires a full scan of the table, and either a sort or a repartition hash on the calculated

column value, even if a small row-set results from the join. (A repartition hash hashes the calculated

column value, writes the results to temporary workspace, then joins to the other table.) This operation is

very expensive and should be avoided, if possible.

Over-normalizing Tables

Some degree of normalization is acceptable in a database. However, avoid designs that rigidly enforce

normalization or are “over-normalized.” These structures are unlikely to perform well for queries, especially

when the tables involved are large and the expected number of rows accessed also is large.

In some cases you can implement the original normalized structures—which maintain complex

relationships—but extend the design by replicating data from selected tables into a single structure to

meet performance requirements. This arrangement can work well when the original structures are

characterized by low volatility and when the database supports mixed application requirements, such as

in ODS and DSS applications. Over-normalization can be very useful if a logical table has an “active” part

that is used in many queries, and a “non-active” part that is used in only a few queries. By splitting the

table in two, you get better query performance for those queries that select only the active set of

columns.

Load-only Optimized Design

Database projects often are characterized by tight schedules; implementation begins even though

query requirements are barely understood by the designers. This situation might be acceptable for proof-

of-concept efforts or for pilot projects, which often are experimental in nature. But it is a mistake to

proceed with a production implementation without understanding how the queries will access the data.

To construct the database, the designers must make choices about primary keys, key-column

placement, partitioning, and indexing. Invariably, tight schedules result in suboptimal design decisions,

causing either the users to have to accept slower query-response times or the database to be rebuilt or

significantly modified.