3

Table Of Contents

Chapter 4 Creating and Administering Clusters 65

Cleaning Up Cluster Storage

If you are using cluster storage, and an error occurs, partial files may be left on the

designated cluster storage location. Check the designated cluster storage location to

make sure no partial media files are left there. If you find partial media files, delete

them and submit the job again.

Cluster Storage and QuickTime Reference Movies

Strictly speaking, only actual QuickTime movies (not QuickTime reference movies) are

supported for distributed processing. If you submit a reference movie for distributed

processing, make sure media files specified in the reference movie are available to each

node of the Apple Qmaster cluster. In other words, put the media on the shared

(cluster storage) volume.

Recovery and Failure Notification Features

The Apple Qmaster distributed processing system has a number of built-in features

designed to attempt recovery if there is a problem, and to notify you when it attempts

a recovery.

Recovery Features

The recovery actions described next occur automatically if failures occur in the

Apple Qmaster distributed processing system. There is no need for you, as the

administrator, to enable or configure these features.

If a service stops unexpectedly

If either the cluster controller service or the processing enabled on a service node stops

unexpectedly, the Apple Qmaster distributed processing system restarts the service. To

avoid the risk of endless stopping and restarting, the system restarts the failed service a

maximum of four times. The first two times, it restarts the service right away. If the

service stops abruptly a third or fourth time, the system restarts it only if it had been

running for at least 10 seconds before the service stopped.

If a batch is interrupted

When a service stops suddenly while in the middle of processing an Apple Qmaster

batch, the cluster controller resubmits the interrupted batch in a way that prevents the

reprocessing of any batch segments that were complete before the service stopped.

The cluster controller delays resuming the batch for about a minute from the time it

loses contact with the service.