3

Table Of Contents
Chapter 4 Creating and Administering Clusters 65
Cleaning Up Cluster Storage
If you are using cluster storage, and an error occurs, partial files may be left on the
designated cluster storage location. Check the designated cluster storage location to
make sure no partial media files are left there. If you find partial media files, delete
them and submit the job again.
Cluster Storage and QuickTime Reference Movies
Strictly speaking, only actual QuickTime movies (not QuickTime reference movies) are
supported for distributed processing. If you submit a reference movie for distributed
processing, make sure media files specified in the reference movie are available to each
node of the Apple Qmaster cluster. In other words, put the media on the shared
(cluster storage) volume.
Recovery and Failure Notification Features
The Apple Qmaster distributed processing system has a number of built-in features
designed to attempt recovery if there is a problem, and to notify you when it attempts
a recovery.
Recovery Features
The recovery actions described next occur automatically if failures occur in the
Apple Qmaster distributed processing system. There is no need for you, as the
administrator, to enable or configure these features.
If a service stops unexpectedly
If either the cluster controller service or the processing enabled on a service node stops
unexpectedly, the Apple Qmaster distributed processing system restarts the service. To
avoid the risk of endless stopping and restarting, the system restarts the failed service a
maximum of four times. The first two times, it restarts the service right away. If the
service stops abruptly a third or fourth time, the system restarts it only if it had been
running for at least 10 seconds before the service stopped.
If a batch is interrupted
When a service stops suddenly while in the middle of processing an Apple Qmaster
batch, the cluster controller resubmits the interrupted batch in a way that prevents the
reprocessing of any batch segments that were complete before the service stopped.
The cluster controller delays resuming the batch for about a minute from the time it
loses contact with the service.