Network Card User Manual

52 MPCMM0001 Chassis Management Module Software Technical Product Specification
Process Monitoring and Integrity
6.7.9 Excessive Restarts, Failed Escalate Failover/Reboot, Non-
Critical
In this scenario PMS detects a process fault. The severity of the process is configured to a value
that is not critical. The configured recovery action is: restart the process. However, the PMS also
detects that the process has exceeded the threshold for excessive process restarts. Therefore, the
PMS will execute the escalation action. The configured escalation recovery action is: failover to
the standby CMM and upon successfully executing the failover, reboot the now standby CMM.
The failover recovery action is unsuccessful (standby is not available, etc.). The process being
monitored is not of a critical severity and therefore the reboot of the CMM will not be performed.
6.7.10 Excessive Restarts, Failed Escalate Failover/Reboot,
Critical
In this scenario, PMS detects a process fault. The severity of the process is configured as critical.
The configured recovery action is: restart the process. However, the PMS also detects that the
process has exceeded the threshold for excessive process restarts. Therefore, the PMS will execute
the escalation recovery action. The configured escalation recovery action is: failover to the standby
CMM and upon successfully executing the failover, reboot the now standby CMM. The failover
Table 14. Excessive Restarts, Failed Escalate Failover/Reboot, Non-Critical
Description Event String UID Assert Severity
PMS detects a faulty process. The
mechanism (existence, thread
watchdog, or integrity) used to detect
the fault will determine which of the
event type strings will be used.
Process existence fault;
attempting recovery or
Thread watchdog fault; attempting
recovery or
Process integrity fault; attempting
recovery
# Assert Configure
The recovery action specified is
"restart process"
Attempting process restart
recovery action
# N/A Configure
PMS detects that the process has
been restarted excessively.
Recovery failure due to excessive
restarts
# N/A Configure
The escalated recovery action
specified is "failover and reboot"
Attempting failover & reboot
escalated recovery action
# N/A Configure
PMS executes a failover.
The existing code generates the
events for failover. They are
separate from process monitoring
events and are not described
here.
-N/A N/A
PMS detects that it is still running on
the active CMM. The process is not
critical and therefore the reboot
operation will not be performed.
Failover & reboot escalated
recovery failure
# N/A Configure
No attempt will be made to recover
the process. The PMS will stop
monitoring the process.
See Section 6.7.11, “Process
Administrative Action” on page 53, for
information about how to re-enable
monitoring and de-assert the event.
Process existence fault;
monitoring disabled or
Thread watchdog fault; monitoring
disabled or
Process integrity fault; monitoring
disabled
# Assert Configure