DataLoader/MX Reference Manual (H06.03+, J06.03+)

ManualsBrandsHP ManualsServerHP Integrity NonStop J-Series

111

112

113

114

115

116

117

118

119

120

Recovery Strategies

DataLoader/MX Reference Manual—543544-001

7-4

Restarting From An Unknown State

•

Restarting from an unknown state. In this method, you leave the modifications

made by the failed load in the table but devise a way for the rerun load to

recognize, on a row-by-row basis, whether the action it is preparing to take has

already been done by a previously failed run. For more information, see Parallel

Considerations on page 7-8 for a description of self-balancing parallel

configurations.

•

Restart from a known state. In this method, you bring the table to a known state

and then rerun the load, starting from this known state. If the table began as an

empty table, you would purge the table of data before restarting. If the table is

small, it would be practical to copy it before you start, and then replace the target

file with the copy if the load fails. Be sure to build this copy operation into your load

procedure.

You must decide which approach best fits your application, taking into consideration

the time required to do the rerun and the computing resources that either approach

needs.

Restarting From An Unknown State

To use this method, you must know or determine whether a row was altered by the

failed load. If you have a way of determining this, you might be able to use restarting

from an unknown state.

Restarting from an unknown state has several other issues:

•

You might need to add an additional column to the table to identify the last load run

to have processed this table, which requires extra space. However, this column

can serve other useful functions, so the required space might be worth allocating.

•

You probably should not use the restarting from an unknown state method if a

given load job can update a given row multiple times, because you will be required

to add yet another column to the table. For more information, see Updating on

page 7-5.

•

This method might use more computing resources. The load process must check

to see if this row has been processed before. You might be able to save resources

when performing the check by attempting to insert a row. If the row has been

inserted, a duplicate key error is returned. Otherwise, the insert finished

successfully.

If you intend to update the row, you need to make an explicit check. Often this can

be pushed down into the disk process together with the update statement, making

the cost very low.

•

You might create a load program that accepts a parameter indicating whether this

load run is a rerun or not. If it is a rerun, the program executes the more expensive

checking code. If it is not a rerun, the program executes the normal code.