DataLoader/MX Reference Manual (G06.24+)
Recovery Strategies
DataLoader/MX Reference Manual—525872-002
7-8
Multiprocess Considerations
Multiprocess Considerations
What is a failure in a situation where an application consists of multiple processes? If
one process fails, should the others continue running? DataLoader/MX uses the
strategy with the simplest restart state. If recovery is not simple and obvious, the
DataLoader/MX process quits. This strategy propagates through the whole multi-
process application, terminating each process. When the error involves data in the
records DataLoader/MX is processing, DataLoader/MX logs the record together with
the error to its error log file specified with the -E parameter.
A common loading scenario is to create multiple DataLoader/MX processes,
configured with a single upstream DataLoader/MX process that reads an input file and
distributes the records to a number of downstream DataLoader/MX processes, which,
in turn, write the data to the database. Which of these processes should maintain the
restart file?
The upstream DataLoader/MX process would seem to be the obvious choice.
However, problems exist with that approach that are not related to recovery. The
upstream DataLoader/MX process does not actually alter the database but only sends
records to each downstream DataLoader/MX process for them to perform the updates.
Suppose the single upstream DataLoader/MX is doing the transaction bracketing and
keeping the restart file. When it comes time to commit a transaction and update the
restart file, the upstream process has no way to determine whether all the work
intended to be in the transaction has been done. Furthermore, the upstream process
cannot determine how long to wait before the work is done and it is safe to commit the
transaction. If it commits the transaction before all the work is finished, one or more of
the downstream DataLoader/MX processes will receive an error indicating that it had a
stale transaction identifier when it finally tries to perform work included in the now-
committed transaction.
As a result, in a multiprocess DataLoader/MX application, avoid using transaction
bracketing in the upstream DataLoader/MX processes. If you use transaction
bracketing, it must be done in the downstream DataLoader/MX processes.
Parallel Considerations
Loading scenarios using parallelism for increased performance are commonly used by
having an upstream DataLoader/MX process that reads an input file and distributes
records to multiple downstream DataLoader/MX processes. You can accomplish this in
two ways.
•
Have the downstream DataLoader/MX processes read from the upstream
DataLoader/MX process. This method is self-balancing because each process can
get another block of records as soon as it has finished its current block of records.
Downstream processes that are running fast because they are in CPUs with little
competition can read as many records as they are able to and are not held up by
other downstream DataLoader/MX processes that are running in a busy CPU.