Specifications

© Copyright IBM Corp. 2014. All rights reserved. 107
Draft Document for Review May 12, 2014 12:46 pm 5102ch04.fm
Chapter 4. Continuous availability and
manageability
This chapter provides information about IBM reliability, availability, and serviceability (RAS)
design and features. This set of technologies, implemented on IBM Power Systems servers,
improves your architecture’s total cost of ownership (TCO) by reducing planned and
unplanned down time.
The elements of RAS can be described as follows:
򐂰 Reliability: Indicates how infrequently a defect or fault in a server occurs
򐂰 Availability: Indicates how infrequently the functionality of a system or application is
impacted by a fault or defect
򐂰 Serviceability: Indicates how well faults and their effects are communicated to system
managers and how efficiently and non disruptively the faults are repaired
Each successive generation of IBM servers is designed to be more reliable than the previous
server family. POWER8 processor-based servers have new features to support new levels of
virtualization, help ease administrative burden, and increase system utilization.
Reliability starts with components, devices, and subsystems designed to be fault-tolerant.
POWER8 uses lower voltage technology, improving reliability with stacked latches to reduce
soft error susceptibility. During the design and development process, subsystems go through
rigorous verification and integration testing processes. During system manufacturing,
systems go through a thorough testing process to help ensure high product quality levels.
The processor and memory subsystem contain features that are designed to avoid or correct
environmentally induced, single-bit, intermittent failures. The features can also handle solid
faults in components, including selective redundancy to tolerate certain faults without
requiring an outage or parts replacement.
4