PCI Error Recovery Product Note, June 2007

Chapter
7
PCI Error Recovery Product Note
What is PCI Error Recovery?
The PCI Error Recovery feature provides the ability to detect, isolate, and automatically recover from a PCI
error, avoiding a system crash. PCI Error Recovery is included with the HP-UX 11i v3 operating system, and
it is enabled by default.
NOTE PCI Error Recovery is not supported on all platforms. To determine if PCI Error Recovery is
supported on your system, see the PCI Error Recovery Support Matrix, available at
http://docs.hp.com/en/ha.html in the PCI Error Recovery section.
With the PCI Error Recovery feature enabled, if an error occurs on a PCI bus containing an I/O card that
supports PCI Error Recovery:
The PCI bus is quarantined to isolate the system from further I/O and prevent the error from damaging
the system.
The PCI Error Recovery feature will attempt to recover from the error and reinitialize the bus so I/O can
resume.
If an error occurs during the automated error recovery process, the bus and I/O card will remain quiesced. If
the bus contains a card that supports online addition, replacement, or deletion (OL*) and the card is in a
hotpluggable slot, you can use the olrad command (or the attention button) to manually recover from the
error by replacing the card.
For information on OL* operations, see the Interface Card OL* Support Guide, available at:
http://docs.hp.com/en/ha.html
To determine if OL* is supported, see the I/O card documentation or support matrix available at
http://docs.hp.com/en/netcom.html
If the PCI Error Recovery feature is disabled and an error occurs on a PCI bus, a Machine Check Abort (MCA)
or a High Priority Machine Check (HPMC) will occur, and the system will crash.
IMPORTANT PCI Error Recovery is enabled by default. If you use HP Serviceguard, HP recommends the PCI
Error Recovery feature only be enabled if your storage devices are configured with multiple
paths and you have not disabled HP-UX native multipathing. If PCI Error Recovery is enabled,
but your storage devices are configured with only a single path, HP Serviceguard might not
detect when connectivity is lost. If HP Serviceguard does not detect loss of connectivity, it does
not cause a failover. For instructions on using the pci_eh_enable tunable to disable PCI Error
Recovery, see “Tunable Kernel Parameters” on page 12.
NOTE If a PCI error occurs on an I/O card very early in the boot process or an OL* online addition
operation, the I/O card will not be claimed and the software state of the I/O card will be marked
as UNUSABLE in the ioscan(1m) output. To recover I/O cards that are in the UNUSABLE state, a
system reboot is required.