Dataloader/MP Reference Manual

ManualsBrandsHP ManualsServerHP NonStop G-Series

Table Of Contents

Running DataLoader/MP

DataLoader/MP Reference Manual—424148-003

3-12

Parallelism

A powerful mechanism for loading or maintaining large amounts of data in a timely

fashion is parallelism. DataLoader/MP has been designed to allow the creation of

parallel loading and maintenance scenarios, taking full advantage of the parallel

capabilities of the underlying system.

There are two primary considerations in a parallel load or maintenance scenario:

creating parallelism and taking advantage of parallelism.

Creating Parallelism

If the input is a single stream, such as a set of tapes that cannot be processed

separately or a single LAN transfer that cannot be changed into multiple simultaneous

transfers, you must break the single input stream into multiple streams that can be

processed in parallel. DataLoader/MP enables you to do this by having a single

DataLoader/MP process read the input and distribute it to other processes (usually

other DataLoader/MP processes) in an efficient manner.

If it does not matter which process receives a given input record, you can run an initial

DataLoader/MP process with its output $RECEIVE and specify this DataLoader/MP

process as the input for the multiple downstream processes. This approach provides a

self-balancing and easily tunable way to create this type of parallelism.

If each input record must go to a specific downstream process, you can specify the

KEYRANGE interpretation for the initial DataLoader/MP process output file.

Sometimes the load or maintenance strategy does not involve doing complete

processing of the input at the time it is read (perhaps the data will be stored in

intermediate files for recovery or batch control purposes). If you want to break the input

into a number of files based only on the record count, you can use an INDIRECT file as

the -O= file with the MAX modifier on each of the file names in the INDIRECT file. This

makes DataLoader/MP divide the input into multiple output files (usually on multiple

disks on multiple processors), setting the stage for complete parallelism at the next

stage of processing.

>DATALOAD -t=100<restartfile=xyz>

Directs DataLoader/MP to set the

number of records per TMF transaction

to 100 and to create a restart file named

xyz

>DATALOAD -t=100 -t=200

Produces an error because the TMF

parameter is given twice

>DATALOAD -I=batch1 -C=1000 &

-F=100

Directs DataLoader/MP to skip the first

100 records, then process 1000 records

>DATALOAD /OUT $d.#lst1/ &

-I=batch1 -C=1000 -F=100 -O=

Directs DataLoader/MP to skip the first

100 records, then process 1000 records,

and to send output to printer $d.#lst1