SQL/MX Release 2.0 Best Practices

Possible Implementations 30

Partitioning of tables and indexes is required if you want to take advantage of parallelism. HP

recommends that you limit the number of partitions used, because processes perform better when they

access fewer drives. This arrangement is different from the traditional SQL/MP method of database

design. Normally, distributing partitions across many drives increases performance.

Fact Table Partitioning Techniques

Fact tables are often very large and, therefore, must be partitioned across multiple disk volumes.

To obtain good parallel scan performance, the partitions must be distributed over all processors. Ideally,

the number of table partitions should be a multiple of the number of processors, and the partitions should

be equal in size. Otherwise, scan performance is uneven and is dominated by the largest partition.

To avoid controller I/O contention, place multiple partitions within the same processor on disk volumes

attached to different controllers, if possible.

You can partition tables in three ways. Each method relies on the standard range partitioning or hash

partitioning capabilities of SQL/MX software. You must fully understand the advantages and

disadvantages of each method at design time. The specific method used affects the load architecture,

query performance, and degree of database maintenance.

To choose an effective partitioning method, you must first understand the nature of the data and the

type of processing that will occur, both in batch and query.

Consider these questions when you decide which column or columns to use for partitioning and the order

of the key columns:

• What is the expected rate and frequency of loads?

• What is the time window for the loads?

• Are data loads done in batch or online?

• Will new rows be added in sequential order or in random order?

• After the data is loaded, will it be processed in batch or by queries?

• Does the partitioning method aid the order in which the data is accessed?

• How often will queries supply predicates for the key columns, including the partitioning column?

• What is the data distribution and unique value count of each key column?

• Which key columns are likely to enable MDAM query technology processing?

A fact table is the central part of the star schema, often the schema of choice for databases. The

granularity of the fact table is determined by the level of detail in the dimension tables. Each dimension is

represented in the fact table by a dimensional value, which becomes one of the fact table key columns.

The number of dimensions usually equals the number of fact table key columns. The primary key of the

fact table is composed of the collection of dimension columns and uniquely identifies each row. Each

fact table key column is a foreign key to a corresponding dimension table.

For example, a simple retail fact table may consist of three dimensions and has a grain of time, location,

and product. Each dimension is represented in the fact table as an individual key column, and each row

of the fact table includes all three dimensions.

If your fact table does not possess these characteristics—that is, if there are key columns in the fact table

that do not reference dimension tables, or if there are data columns in the fact table that are foreign

keys to dimension tables—review your design’s dimensional modeling techniques.

In standard range partitioning, the leading column or columns partition the table horizontally. For

example, if the chosen key order is location, time, and product, you could use location for the

partitioning column. You would order the table data by location and distribute it across all disk volumes

based on specific partitioning values.