CP6100 I/O Process Programming Manual

Using CP6100: Programming
COMPONENT FAILURES IN THE NonStop SYSTEM. The 6100 subsystem
| provides two paths to any line for the single-port controller
| configuration and four paths to any line for the dual-port
| controller configuration. Thus, your programs maintain their
| access to a line even if a processor, I/O channel, controller
| port, or controller fails. (Later we'll discuss how an
| application finds out a path switch has occurred and how the
| application can proceed gracefully after a switch.) Refer to
| Figures 2-3a and 2-3b as you read the next few paragraphs.
If a processor or I/O channel fails, CP6100 switches control to
its backup I/O process, which communicates with the line by way
of the alternate processor, channel, and port. Because only one
| port of a controller can be in use at a time (the single-port
| controller has but one port), every other process using the
controller must also switch; each process, on its next attempt to
reach the line by the old path, is rejected with an error and
forced to switch to the alternate path.
If a controller fails, or if the port on the current access path
fails, CP6100 asks the CSM to let it use the other controller.
| If the new controller has the other processor as its primary
| (as is always the case for the single-port controller
| configuration), then a processor switch occurs also; the backup
| CP6100 process becomes the primary process.
| Notice that, in a dual-port controller configuration, a
| controller port failure does not always indicate a controller
failure. So as not to leave idle a controller that might
actually be functional, the CSM process switches the processor
ownership of a controller if it discovers a failure during a
download or status probe. (The CSM does a status probe every
minute on the backup path.) When next the controller is called
to service, either because the other controller has failed or by
operator request, access to the controller will be by the new
owning processor.
All these path changes happen automatically when there is a
failure. A system manager can also make them happen to balance
the load in a system. The use of CMI for load-balancing is
covered in Section 3.
TOTAL SUBSYSTEM AND LIU FAILURES. Sometimes an error condition
affects more than one path to a device. CP6100 attempts to
recover from such failures in a way that has the minimum impact
on running applications.
October 1985
2-8