Specifications
Chapter 4. Continuous availability and manageability 109
Draft Document for Review May 12, 2014 12:46 pm 5102ch04.fm
automatically increase output to compensate for increased heat in the central electronic
complex.
4.1.3 Redundant components and concurrent repair
High-opportunity components (those that most affect system availability) are protected with
redundancy and the ability to be repaired concurrently.
The use of these redundant components allows the system to remain operational:
POWER8 cores, which include redundant bits in L1 instruction and data caches, L2
caches, and L2 and L3 directories
POWER8 Processors Memory Buffer includes also a L4 Cache capability with similar error
protection capabilities as available in the L3 Cache
Power S822 main memory DIMMs, which use an innovative ECC algorithm, from IBM
research, that improves bit-error correction and memory failures
Redundant and hot-swap cooling
Redundant and hot-swap power supplies
For maximum availability, be sure to connect power cords from the same system to two
separate power distribution units (PDUs) in the rack, and to connect each PDU to
independent power sources. Tower form factor power cords must be plugged into two
independent power sources to achieve maximum availability.
4.2 Availability
First-failure data capture (FFDC) is the capability of IBM hardware and microcode to
continuously monitor hardware functions. This process includes predictive failure analysis,
which is the ability to track intermittent correctable errors and to take components offline
before they reach the point of hard failure. This way avoids causing a system outage.
The POWER8 family of systems can do the following automatic functions:
Self-diagnose and self-correct errors during run time.
Automatically reconfigure to mitigate potential problems from suspect hardware.
Self-heal or automatically substitute good components for failing components.
Before ordering: Check your configuration for optional redundant components before
ordering your system.
Remember: Error detection and fault isolation is independent of the operating system in
POWER8 processor-based servers.