SQL/MX Release 2.0 Best Practices

ManualsBrandsHP ManualsServerHP NonStop G-Series

Possible Implementations 32

Hash partitioning uses a hashing key to randomly distribute the data. New rows are inserted within existing

cluster key order within the volume, causing table fragmentation as existing blocks are split to make room

for newly added rows. Online reloads are required periodically to defragment all partitions and maintain

adequate scan performance. Note that all partitions become fragmented as new data is inserted into

the table across the entire partitioning range.

This type of partitioning is not difficult to plan. The partition sizes are balanced, and individual partitions

are not unexpectedly filled, provided that the unique entry count of the hash partitioning key is greater

than 50 times the number of partitioning volumes.

However, delete processing of old data is complex, and must be accomplished through the use of

queries. Scans of each partition can occur in parallel; however, each partition must be entirely scanned

to locate old rows for purging.

Depending on the number of unique values for the clustering key columns that precede the search

predicate (i.e. date column), you could use MDAM query technology processing to skip ranges of

unaffected rows. Be aware that if delete processing is done through the MXCI utility (or other query tools),

a transaction based on NonStop TM/MP software will exist for the duration of the query, and may be

aborted due to the NonStop TM/MP software transaction timer. However, you can alter this timer before

and after running the query.

Delete processing will acquire row locks that might escalate into partition or table locks, preventing other

transactional access to the data. Managing delete processing programmatically gives you more control

over the granularity of locks placed on the data. Make sure you understand the consequences of either

technique.

Update processing requirements are similar to those of delete processing.

Query processing is more likely to use parallel execution, especially if the high-order clustering column is

omitted from the list of predicates. In this case, you might use MDAM query technology to avoid full

partition scans. Hash partitioning structures the data in each partition based on the clustering key which is

suitable for query processing. It can take advantage of MDAM query technology processing capabilities.

The impact of deleting and updating with hash partitioning is greater than that of sequential partitioning

on database management.

You must schedule online reloads to run periodically to eliminate fragmentation and maintain good scan

performance. All partitions may be online reloads because data will be inserted to every partition of the

table. Hash data partitions must be managed carefully to avoid partition-full conditions, and to maintain

the balance of data across partitions. You might need to run the online partition-management facilities

periodically to rebalance the partitions.

Hash partitioning supports secondary indexes, if they are required on the fact table, because the data is

added using insert processing.

Backup requirements also are complex. Since a load adds rows to all partitions, the entire table must be

backed up. For small to medium-size tables, backing up the entire table or taking NonStop TM/MP online

dumps, and completing within the availability window, may be adequate. For larger tables or

applications that load data frequently or continuously, keeping copies of the original source data and

reloading it when necessary may be more effective.

Sequential range partitioning and hash partitioning each have advantages. Sequential partitioning

preserves table compactness and minimizes database maintenance—there is no fragmentation—while

hash partitioning can evenly distribute rows across all partitions, reducing bottlenecks by increasing

parallelism.

When queries access the data from the fact table, MDAM query technology probes for all values of the

hash key, one for each partition, which is efficient for query processing. Queries are more likely to exploit

the capabilities of the inherent parallelism. Additionally, a view can be constructed in place of the fact

table, which forces a join between the partition table and the fact table on the hash key. This method

incurs the cost of an additional join, but eliminates MDAM query technology processing, because the

hash-key values can be obtained from the dimension table. This method has been shown to produce

better results when multiple ESPs all begin MDAM query technology probing of the fact table at the same