DataLoader/MX Reference Manual (H06.03+, J06.03+)

Recovery Strategies
DataLoader/MX Reference Manual543544-001
7-4
Restarting From An Unknown State
Restarting from an unknown state. In this method, you leave the modifications
made by the failed load in the table but devise a way for the rerun load to
recognize, on a row-by-row basis, whether the action it is preparing to take has
already been done by a previously failed run. For more information, see Parallel
Considerations on page 7-8 for a description of self-balancing parallel
configurations.
Restart from a known state. In this method, you bring the table to a known state
and then rerun the load, starting from this known state. If the table began as an
empty table, you would purge the table of data before restarting. If the table is
small, it would be practical to copy it before you start, and then replace the target
file with the copy if the load fails. Be sure to build this copy operation into your load
procedure.
You must decide which approach best fits your application, taking into consideration
the time required to do the rerun and the computing resources that either approach
needs.
Restarting From An Unknown State
To use this method, you must know or determine whether a row was altered by the
failed load. If you have a way of determining this, you might be able to use restarting
from an unknown state.
Restarting from an unknown state has several other issues:
You might need to add an additional column to the table to identify the last load run
to have processed this table, which requires extra space. However, this column
can serve other useful functions, so the required space might be worth allocating.
You probably should not use the restarting from an unknown state method if a
given load job can update a given row multiple times, because you will be required
to add yet another column to the table. For more information, see Updating on
page 7-5.
This method might use more computing resources. The load process must check
to see if this row has been processed before. You might be able to save resources
when performing the check by attempting to insert a row. If the row has been
inserted, a duplicate key error is returned. Otherwise, the insert finished
successfully.
If you intend to update the row, you need to make an explicit check. Often this can
be pushed down into the disk process together with the update statement, making
the cost very low.
You might create a load program that accepts a parameter indicating whether this
load run is a rerun or not. If it is a rerun, the program executes the more expensive
checking code. If it is not a rerun, the program executes the normal code.