Datasheet

ManualsBrandsMasterclock ManualsAmplifierD232

141

142

143

144

145

146

147

148

149

150

Reliability, Availability, and Serviceability

6-16 Intel

E8870 Scalable Node Controller (SNC) Datasheet

These errors do not compromise further chipset operation. These errors leave an error status

and log trail as the error propagates from one component to its destination. There are three

types of trailing errors detected by the chipset: 2x ECC, 1x ECC, and master abort. Each CT

error can be an error source, midpoint, and/or endpoint.

Table 6-4 provides general information for all of the errors detected by the E8870 chipset. An error

may or may not be associated with a transaction. For example, an inbound 2x ECC error detected

by the SNC can only be the result of a processor read from memory or I/O. On the other hand, an

assertion of BINIT# on the node would not occur as a result of any specific transaction.

Table 6-4 provides information as to the type of transaction that is associated with an error. Also

provided is the error class. For CT errors, the trailing error type is provided, along with the

detection point (source, mid or end). The type of trailing error along with an understanding of

whether an error can be a source, midpoint, or endpoint can be used in determining the error trail.

6.5.3.1 Error Parsing

The error record can be viewed as the raw data. Identifying the error source is useful for scheduling

system maintenance, recovery, reconfiguration on boot after error, etc. For example, a frequently

occurring parity on a scalability port (corrected using link-level retry) may be an indication of

problems with the connector on that port.

The FERRST register of each component has a Last_Err_Value field for each error type (fatal, unc,

cor). As each component detects a fatal and/or non-fatal error, it latches the value of its error pin

associated with the error type. As a result, the value of the Last_Err_Value can be used to help

identify the error source for fatal and non-fatal errors. This relies on platform specific support for

this feature.

There are cases where parsing of the error record cannot be accomplished in a reliable way. For the

purposes of these guidelines, parsing an error means to analyze the contents of the error record

(using an algorithm) to determine the source of the error in the platform. Some error parsing

guidelines are:

• When the first fatal error in the platform is NC, parsing of any other errors is unreliable.

• If there is an NC error reported in any of the components, parsing of any error in the platform

may be unreliable.

• If the first non-fatal error is CT, parsing the CT error may be unreliable if there are NCS errors

in the system.

• When the first non-fatal error is CT, multiple errors have been detected if there are other CT

errors detected by the chipset that have different trailing types.

• Since CS errors do not propagate, determining the error source of CS errors is implied (source

is the component that detected the error). Allowing CS errors to populate the FERRST may

compromise the parsing of an error trail for subsequent CT errors.

• For CT errors, the trail can be recreated using the src, mid, and endpoint attributes of each

error. For example, a 2xECC error on the processor bus for a write to local memory will have

F6 reported in the FERRST, and M1 reported in the SERRST. F6 is a source of a CT error, and

M1 is shown as an endpoint. In addition, the value for LastERR2 in the FERRST will show the

SNC detected the first uncorrectable error in the system. Note that the error trail in a multi-

node E8870 platform can have up to two branches (this situation can occur on a implicit write

back that has a 2xECC error where the requesting node that receives the poisoned data is

different than the home node – that gets the memory update

1. A larger system may provide this information in external logic to support this feature.