NET/MASTER RMS Management and Operations Guide

BASERULE
Base Rulesets
115415 NonStop NET/MASTER RMS Management and Operations Guide 12–3
ZCPUDOWN ZCPUDOWN is a message action rule that acts in response to event number 101 from
the CPU subsystem. The rule dumps the memory of the failed CPU and reloads the
CPU.
Message Delivery
The rule instructs the message handler to deliver the original message (event
number 101 from the CPU subsystem) which appears in NonStop NET/MASTER MS
as the following:
CPU0101 Processor Down, CPU
nn
nn
identifies the failed CPU. The subject is
nn
(for example, 1).
Rule Actions
When a message triggers the rule, the message handler starts NCL procedure
ZRMSCPUN to perform the following:
1. Check whether the $ZDMP process is active. When the $ZDMP process is active,
the Tandem Failure Diagnostics System (TFDS) is active. TFDS is an automated
CPU dump and reload utility. If TFDS is active, the rule terminates with no action.
2. Issue the TACL RCVDUMP command to dump the memory of the CPU identified
in the message. The dump file is $SYSTEM.ZZDUMPnn.DUMPmmmm. nn is the
CPU identifier in the message, and mmmm is a four-digit sequence number,
starting from 0000, for the dumps from a given CPU.
3. Issue the TACL RELOAD command to reload the CPU. If the reload operation is
successful, the procedure logs the following message in the NonStop
NET/MASTER MS activity log:
RMS9902 GUARDIAN CPU
nn
RECOVERY COMPLETE. STATUS: UP
4. If Step 3 fails, delay one minute and then retry Step 3. The reload operation
repeats until it is successful or the action is deleted. See “Displaying, Freezing,
and Deleting Active Subjects” in Section 9 for information on how to delete rule
actions for an active subject.
NCL Queue
The rule actions execute in the BASERULEZCPUDOWN NCL queue, which has a
default execution limit of 1.