Specifications

Chapter 4. Continuous availability and manageability 139
Draft Document for Review May 12, 2014 12:46 pm 5102ch04.fm
򐂰 View Files...
Display the files associated with this event.
򐂰 Approve Call Home
Approve the call home of this event. This option is only available if the event has not been
approved already.
The Help / Learn more function can be used to get more details in the other available screens
for the Serviceable Event Manager.
4.5 POWER8 RAS enhancements
POWER7 and POWER7+ systems reliability, availability and serviceability features are well
documented in the following whitepaper:
http://public.dhe.ibm.com/common/ssi/ecm/en/pow03056usen/POW03056USEN.PDF
This document can still be used for an in-depth understanding of the available RAS Features
in POWER7. The differences in the POWER8 server family are highlighted as follows:
򐂰 POWER8 Processor
The POWER8 processor module has a maximum of 12 cores compared to a maximum
of 8 cores in POWER7
򐂰 On Chip Controller (OCC)
No need anymore for a separate module to handle Power Management and Thermal
Monitoring.
The On Chip Controller (OCC) is integrated into each processor module. This OCC is
separate from any customer accessible processor core. It is used to execute power
management and thermal monitoring functions that used to be a function of a separate
module in POWER7 (the TPMD). The OCC, in addition, can also be programmed
execute other RAS related functions independent of any host processor.
򐂰 Integrated PCIe Controller
No more external I/O hub controller needed. Each processor module can directly drive
two I/O slots or devices. Error handling and recovery remains similar to
POWER7/POWER7+.
򐂰 Coherence Attach Processor Interface (CAPI)
򐂰 Fabric Bus Lane repair
similar to POWER7+
handles all external fabric busses
no need to use the Power On Reset Engine (PORE)
򐂰 Additional Internal Fabric Bus address error generation and checking
A more precise analysis of faults across the SMP interconnection
򐂰 Memory Control Replay buffer
Provides additional soft error protection in the memory buffer
򐂰 The Memory subsystem has an L4 Cache implemented
ECC protected
Data can be purged on errors similar to L2/L3 cache handling