Specifications

ManualsBrandsQuantum Data ManualsProjector822S

131

132

133

134

135

136

137

138

139

140

Chapter 4. Continuous availability and manageability 121

Draft Document for Review May 12, 2014 12:46 pm 5102ch04.fm

the system is enabled to perform fault determination and isolation, whether or not the

system processors are operational. Boot-time BISTs can also find faults undetectable by

processor-based power-on self-test (POST) or diagnostics.

򐂰 Wire-tests discover and precisely identify connection faults between components such as

processors, memory

򐂰 Initialization of components such as ECC memory, typically by writing patterns of data and

allowing the server to store valid ECC data for each location, can help isolate errors.

To minimize boot time, the system determines which of the diagnostics are required to be

started to ensure correct operation, based on the way that the system was powered off, or on

the boot-time selection menu.

Host Boot IPL

In POWER8 the initialization process during IPL has been changed a little bit. The Flexible

Service Processor (FSP) is no longer the only instance that initializes and runs the bootup

process. With POWER8 the FSP initializes the boot processes, but on the POWER8

Processors itself one part of the Firmware is running and performing the CEC chip

initialization. A new component called the PNOR Chip stores the Host Boot firmware and the

Self Boot Engine (SBE) is an internal part of the POWER8 Chip itself and is used to boot the

chip.

With this Host Boot initialization new Progress codes are available. An example for a FSP

Progress code is C1009003. During the Host Boot IPL Progress codes like CC009344

appear.

In case of a failure during the Host Boot process a new Host Boot System Dump is collected

and stored. This type of dump includes Host Boot memory and will be offloaded to HMC when

available.

Run time

All Power Systems servers can monitor critical system components during run time, and they

can take corrective actions when recoverable faults occur. IBM hardware error-check

architecture provides the ability to report non-critical errors in an

out-of-band communications

path to the service processor without affecting system performance.

A significant part of IBM runtime diagnostic capabilities originate with the service processor.

Extensive diagnostic and fault analysis routines were developed and improved over many

generations of POWER processor-based servers, and enable quick and accurate predefined

responses to both actual and potential system problems.

The service processor correlates and processes runtime error information by using logic

derived from IBM engineering expertise to count recoverable errors (called thresholding) and

predict when corrective actions must be automatically initiated by the system. These actions

can include the following items:

򐂰 Requests for a part to be replaced

򐂰 Dynamic invocation of built-in redundancy for automatic replacement of a failing part

򐂰 Dynamic deallocation of failing components so that system availability is maintained

Device drivers

In certain cases, diagnostics are best performed by operating system-specific drivers, most

notably I/O devices that are owned directly by a logical partition. In these cases, the operating

system device driver often works in conjunction with I/O device microcode to isolate and

recover from problems. Potential problems are reported to an operating system device driver,