Specifications
5102ch04.fm Draft Document for Review May 12, 2014 12:46 pm
116 IBM Power System S822 Technical Overview and Introduction
PCI Enhanced Error Handling (EEH) enabled adapters respond to a special data packet that
is generated from the affected PCI slot hardware by calling system firmware, which will
examine the affected bus, allow the device driver to reset it, and continue without a system
reboot. For Linux, EEH support extends to the majority of frequently used devices, although
various third-party PCI devices might not provide native EEH support.
Each processor module can directly drive two I/O slots or devices with the PCIe controllers in
each processor and with an external module of any kind, or a single controller can handle
additional functions.
While the I/O Hub has been integrated in the POWER8 processor module it still retains a
design that supports end-point error recovery as well as a ‘freeze on fault’ behavior and fault
isolation so errors can be contained to a partition using the I/O.
4.3 Serviceability
IBM Power Systems design considers both IBM and the client’s needs. The IBM Serviceability
Team, enhanced the base service capabilities and continues to implement a strategy that
incorporates best-of-its-kind service characteristics from diverse IBM Systems offerings.
The purpose of serviceability is to repair the system while attempting to minimize or eliminate
service cost (within budget objectives), while maintaining high customer satisfaction.
Serviceability includes system installation, MES (system upgrades/downgrades), and system
maintenance/repair. Depending on the system and warranty contract, service may be
performed by the customer, an IBM representative, or an authorized warranty service
provider.
The serviceability features that are delivered in this system provide a highly efficient service
environment by incorporating the following attributes:
Design for customer setup (CSU), customer installed features (CIF), and
customer-replaceable units (CRU)
Error detection and fault isolation (ED/FI)
First-failure data capture (FFDC)
Converged service approach across multiple IBM server platforms
By delivering on these goals, IBM Power Systems servers enable faster and more accurate
repair, and reduce the possibility of human error.
Client control of the service environment extends to firmware maintenance on all of the
POWER processor-based systems. This strategy contributes to higher systems availability
with reduced maintenance costs.
This section provides an overview of the progressive steps of error detection, analysis,
reporting, notifying, and repairing found in all POWER processor-based systems.
4.3.1 Detecting
The first and most crucial component of a solid serviceability strategy is the ability to
accurately and effectively detect errors when they occur. Although not all errors are a
guaranteed threat to system availability, those that go undetected can cause problems
because the system has no opportunity to evaluate and act if necessary. Power
processor-based systems employ IBM System z® server-inspired error detection