Dataloader/MP Reference Manual

Table Of Contents
Recovery Strategies
DataLoader/MP Reference Manual424148-003
7-4
Restarting From An Unknown State
You must decide which approach best fits your application, taking into consideration
the time required to do the rerun and the computing resources that either approach will
need.
Restarting From An Unknown State
To use this method, you must know or determine whether a row was altered by the
failed load. If you have a way of determining this, you may be able to use restarting
from an unknown state.
Restarting from an unknown state has several other issues:
You might need to add an additional column to the table to identify the last load run
to have processed this table, which requires extra space. However, this column
can serve other useful functions, so the required space might be worth allocating.
You probably should not use the restarting from an unknown state method if a
given load job can update a given row multiple times, because you will be required
to add yet another column to the table. This is discussed in detail under Updating.
This method might use more computing resources. The load process must check
to see if this row has been processed before. You might be able to save resources
when performing the check by attempting to insert a row. If the row has been
inserted, a duplicate key error is returned; otherwise, the insert finished
successfully.
If you intend to update the row, you will need to make an explicit check. Often this
can be pushed down into the disk process together with the update statement,
making the cost very low.
You might create a load program that accepts a parameter indicating whether this
load run is a rerun or not. If it is a rerun, the program executes the more expensive
checking code. If it is not a rerun, the program executes the normal code.
Inserting and Deleting
If the load consists of inserting new rows, you can use the restarting from an unknown
state method. Your program can work its way through the file, attempting to insert
rows. If the insert fails because of a duplicate key, the row was inserted by the failed
run of this same load job. The program can continue until it is able to insert a row
without error, then insert the rest of the rows. The same is true of deletes. If the load
attempts a delete of a row and finds that the row does not exist, that row must have
been deleted by the failed run. For updates, however, the situation is more complex.
Updating
Although many tables contain dates that pertain to the business use of the data, it
usually is not possible to use this information to know at what record the failed load
stopped performing updates. Because of the possibility of delays of data entering the