Specifications

Chapter 4. Continuous availability and manageability 113
Draft Document for Review May 12, 2014 12:46 pm 5102ch04.fm
POWER8 memory subsystem
The POWER8 processor chip contains two memory controllers with four DMI channels per
memory controller. Each channel connects to a single DIMM, A processor chip can address
eight CDIMM modules.
The bus transferring data between the processor and the memory uses CRC error detection
with a failed operation retry mechanism and the ability to dynamically retune bus parameters
when a fault occurs. In addition, the memory bus has spare capacity to substitute a spare
data bit-line for one that is determined to be faulty.
Advanced memory buffer chips are exclusive to IBM and help to increase performance, acting
as read/write buffers. The memory buffer contains a L4 cache capability with error protection
capabilities similar to those of a processor L3 cache.
Memory page deallocation
Although coincident cell errors in separate memory chips are statistically rare, IBM POWER8
processor-based systems can contain these errors by using a memory page deallocation
scheme for partitions that are running IBM AIX operating system, and also for memory pages
that are owned by the POWER Hypervisor. If a memory address experiences an
uncorrectable or repeated correctable single cell error, the service processor sends the
memory page address to the POWER Hypervisor to be marked for deallocation.
Pages that are used by the POWER Hypervisor are deallocated as soon as the page is
released. In other cases, the POWER Hypervisor notifies the owning partition that the page
must be deallocated. Where possible, the operating system moves any data currently
contained in that memory area to another memory area and removes the pages associated
with this error from its memory map, no longer addressing these pages. The operating system
performs memory page deallocation without any user intervention and is transparent to users
and applications.
The POWER Hypervisor maintains a list of pages marked for deallocation during the current
platform initial program load (IPL). During a partition IPL, the partition receives a list of all the
bad pages in its address space. In addition, if memory is dynamically added to a partition
(through a dynamic LPAR operation), the POWER Hypervisor warns the operating system
when memory pages are included that need to be deallocated.
Finally, if an uncorrectable error in memory is discovered, the logical memory block that is
associated with the address that has the uncorrectable error is marked for deallocation by the
POWER Hypervisor. This deallocation becomes effective on a partition reboot if the logical
memory block is assigned to an active partition at the time of the fault.
In addition, the system will deallocate the entire memory group that is associated with the
error on all subsequent system reboots until the memory is repaired. This precaution is
intended to guard against future uncorrectable errors while waiting for parts replacement.
Memory persistent deallocation
Defective memory that is discovered at boot time is automatically switched off. If the service
processor detects a memory fault at boot time, it marks the affected memory as bad so that it
is not used on subsequent reboots.
Upon reboot, if not enough memory is available to meet minimum partition requirements, the
POWER Hypervisor will reduce the capacity of one or more partitions.