Sun Fire™ X4140, X4240, and X4440 Servers Diagnostics Guide Sun Microsystems, Inc. www.sun.com Part No. 820-3067-11 August 2008, Revision A Submit comments about this document at: http://www.sun.
Copyright © 2008 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, California 95054, U.S.A. All rights reserved. Unpublished - rights reserved under the Copyright Laws of the United States. THIS PRODUCT CONTAINS CONFIDENTIAL INFORMATION AND TRADE SECRETS OF SUN MICROSYSTEMS, INC. USE, DISCLOSURE OR REPRODUCTION IS PROHIBITED WITHOUT THE PRIOR EXPRESS WRITTEN PERMISSION OF SUN MICROSYSTEMS, INC. This distribution may include materials developed by third parties.
Contents Preface 1. vii Initial Inspection of the Server 1 Service Troubleshooting Flowchart Gathering Service Information System Inspection 1 2 3 Troubleshooting Power Problems 3 Externally Inspecting the Server Internally Inspecting the Server 2. 3 4 Using SunVTS Diagnostic Software Running SunVTS Diagnostic Tests SunVTS Documentation 7 7 8 Diagnosing Server Problems With the Bootable Diagnostics CD Requirements 8 Using the Bootable Diagnostics CD 3.
Uncorrectable DIMM Errors Correctable DIMM Errors 12 14 BIOS DIMM Error Messages DIMM Fault LEDs 15 15 Isolating and Correcting DIMM ECC Errors A. Event Logs and POST Codes Viewing Event Logs 21 21 Power-On Self-Test (POST) 25 How BIOS POST Memory Testing Works Redirecting Console Output Changing POST Options POST Codes Status Indicator LEDs 26 28 33 37 External Status Indicator LEDs Front Panel LEDs 38 Back Panel LEDs 38 Hard Drive LEDs 39 Internal Status Indicator LEDs C.
Handling of Uncorrectable Errors Handling of Correctable Errors 53 56 Handling of Parity Errors (PERR) 59 Handling of System Errors (SERR) 61 Handling Mismatching Processors 63 Hardware Error Handling Summary Index 64 69 Contents v
vi Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008
Preface The Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide contains information and procedures for using available tools to diagnose problems with the servers. Before You Read This Document It is important that you review the safety guidelines in the Sun Fire X4140, X4240, and X4440 Safety and Compliance Guide.
Related Documentation The document set for the Sun Fire X4140, X4240, and X4440 Servers is described in the Where To Find Sun Fire X4140, X4240, and X4440 Servers Documentation sheet that is packed with your system. You can also find the documentation at http://docs.sun.com. Translated versions of some of these documents are available at http://docs.sun.com.
Typographic ConventionsThird-Party Typeface* Meaning Examples AaBbCc123 The names of commands, files, and directories; onscreen computer output Edit your.login file. Use ls -a to list all files. % You have mail. AaBbCc123 What you type, when contrasted with onscreen computer output % su Password: AaBbCc123 Book titles, new words or terms, words to be emphasized. Replace command-line variables with real names or values. Read Chapter 6 in the User’s Guide. These are called class options.
Sun Welcomes Your Comments Sun is interested in improving its documentation and welcomes your comments and suggestions. You can submit your comments by going to: http://www.sun.
CHAPTER 1 Initial Inspection of the Server This chapter includes the following topics: ■ “Service Troubleshooting Flowchart” on page 1 ■ “Gathering Service Information” on page 2 ■ “System Inspection” on page 3 Service Troubleshooting Flowchart Use the following flowchart as a guideline for using the subjects in this book to troubleshoot the server. TABLE 1-1 Troubleshooting Flowchart To perform this task Refer to this section Gather initial service information.
TABLE 1-1 Troubleshooting Flowchart (Continued) To perform this task Refer to this section View service processor logs and sensor information... “Using the ILOM Service Processor GUI to View System Information” on page 43 ...or view service processor logs and sensor information.
System Inspection Controls that have been improperly set and cables that are loose or improperly connected are common causes of problems with hardware components. Troubleshooting Power Problems ■ If the server will power on, skip this section and go to “Externally Inspecting the Server” on page 3. ■ If the server will not power on, check the following: 1. Check that AC power cords are attached firmly to the server’s power supplies and to the AC sources. 2. Check that the main cover is firmly in place.
Internally Inspecting the Server To perform a visual inspection of the internal system: 1. Choose a method for shutting down the server from main power mode to standby power mode. See FIGURE 1-1 and FIGURE 1-2. ■ Graceful shutdown – Use a ballpoint pen or other stylus to press and release the Power button on the front panel. This causes Advanced Configuration and Power Interface (ACPI) enabled operating systems to perform an orderly shutdown of the operating system.
FIGURE 1-2 X4440 Server Front Panel Locate Button/LED Power Button 2. Remove the server cover. For instructions on removing the server cover, refer to your server’s service manual. 3. Inspect the internal status indicator LEDs. These can indicate component malfunction. For the LED locations and descriptions of their behavior, see “Internal Status Indicator LEDs” on page 39. Note – The server must be in standby power mode for viewing the internal LEDs.
10. If the problem with the server is not evident, you can obtain additional information by viewing the power-on self test (POST) messages and BIOS event logs during system startup. Continue with “Viewing Event Logs” on page 21.
CHAPTER 2 Using SunVTS Diagnostic Software This chapter contains information about the SunVTS™ diagnostic software tool. Running SunVTS Diagnostic Tests The servers are shipped with a Bootable Diagnostics CD that contains the Sun Validation Test Suite (SunVTS) software. SunVTS provides a comprehensive diagnostic tool that tests and validates Sun hardware by verifying the connectivity and functionality of most hardware controllers and devices on Sun platforms.
■ QLogic Host Bus Adapter Test (qlctest) ■ RAM Test (ramtest) ■ Serial Port Test (serialtest) ■ System Test (systest) ■ Tape Drive Test (tapetest) ■ Universal Serial Board Test (usbtest) ■ Virtual Memory Test (vmemtest) SunVTS software has a sophisticated graphical user interface (GUI) that provides test configuration and status monitoring. The user interface can be run on one system to display the SunVTS testing of another system on the network.
Using the Bootable Diagnostics CD To use the diagnostics CD to perform diagnostics: 1. With the server powered on, insert the CD into the DVD-ROM drive. 2. Reboot the server, and press F2 during the start of the reboot so that you can change the BIOS setting for boot-device priority. 3. When the BIOS Main menu appears, navigate to the BIOS Boot menu. Instructions for navigating within the BIOS screens appear on the BIOS screens. 4. On the BIOS Boot menu screen, select Boot Device Priority.
■ Solaris system message log is a log of all the general Solaris events logged by syslogd. The path name of this log file is /var/adm/messages. a. Click the Log button. The Log file window is displayed. b. Specify the log file that you want to view by selecting it from the Log file window. The content of the selected log file is displayed in the window. c.
CHAPTER 3 Troubleshooting DIMM Problems This chapter describes how to detect and correct problems with the server’s Dual Inline Memory Modules (DIMM)s.
DIMM Replacement Policy Replace a DIMM when one of the following events takes place: ■ The DIMM fails memory testing under BIOS due to Uncorrectable Memory Errors (UCEs). ■ UCEs occur and investigation shows that the errors originated from memory. In addition, a DIMM should be replaced whenever more than 24 Correctable Errors (CEs) originate in 24 hours from a single DIMM and no other DIMM is showing further CEs.
3. BIOS reports this event in the service processor’s system event log (SEL) as shown in the sample IPMItool output below: # ipmitool -H 10.6.77.
The lines in the display start with event numbers (in hex), followed by a description of the event. TABLE 3-1 describes the contents of the display: TABLE 3-1 Event (hex) Lines in IPMI Output Description 8 UCE caused a Hypertransport sync flood which lead to system's warm reset. #0x02 refers to a reboot count maintained since the last AC power reset. 9 BIOS detected and initiated 4 processors in system. a BIOS detected a Sync Flood caused this reboot.
to view ECC errors ■ Linux: The HERD utility can be used to manage DIMM errors in Linux. See the x64 Servers Utilities Reference Manual for details. ■ If HERD is installed, it copies messages from /dev/mcelog to /var/log/messages. ■ If HERD is not installed, a program called mcelog copies messages from /dev/mcelog to /var/log/mcelog. The Bootable Diagnostics CD described in Chapter 2 also captures and logs CEs.
Note – The DIMM Fault and Motherboard Fault LEDs operate on stored power for up to a minute when the system is powered down, even after the AC power is disconnected, and the motherboard (or mezzanine board) is out of the system. The stored power lasts for about half an hour. Note – Disconnecting the AC power removes the fault indication. To recover fault information look in the SP SEL, as described in the Sun Integrated Lights Out Manager 2.0 User's Guide.
FIGURE 3-1 DIMMs and LEDs on Motherboard Chapter 3 Troubleshooting DIMM Problems 17
FIGURE 3-2 DIMMs and LEDs on Mezzanine Board Isolating and Correcting DIMM ECC Errors If your log files report an ECC error or a problem with a DIMM, complete the steps below until you can isolate the fault. In this example, the log file reports an error with the DIMM in CPU0, slot 7. The fault LEDs on CPU0, slots 6 and 7 are on. To isolate and correct DIMM ECC errors: 1. If you have not already done so, shut down your server to standby power mode and remove the cover. 2.
3. Press the PRESS TO SEE FAULT button, and inspect the DIMM fault LEDs. See FIGURE 3-1 and FIGURE 3-2. A flashing LED identifies a component with a fault. ■ For CEs, the LEDs correctly identify the DIMM where the errors were detected. ■ For UCEs, both LEDs in the pair flash if there is a problem with either DIMM in the pair. Note – If your server is equipped with a mezzanine board, the motherboard DIMMs and LEDs will be hidden beneath it.
11. Power on the server and run the diagnostics test again. 12. Review the log file. If the tests identify the same error, the problem is in the CPU, not the DIMMs.
APPENDIX A Event Logs and POST Codes This appendix contains information about the BIOS event log, the BMC system event log, the power-on self-test (POST), and console redirection. It contains the following sections: ■ “Viewing Event Logs” on page 21 ■ “Power-On Self-Test (POST)” on page 25 Viewing Event Logs Use this procedure to view the BIOS event log and the BMC system event log. 1.
Main Advanced PCIPnP Boot Security Chipset Exit ****************************************************************************** * Advanced Settings * Configure CPU. * * *************************************************** * * * WARNING: Setting wrong values in below sections * * * may cause system to malfunction. * * * * * * * CPU Configuration * * * * IDE Configuration * * * * Hyper Transport Configuration * * * * ACPI Configuration * * * * Event Log Configuration * * * * IPMI 2.
b. From the Advanced Settings screen, select Event Log Configuration. The Advanced Menu Event Logging Details screen is displayed. Advanced ****************************************************************************** * Event Logging details * View all unread events * * *************************************************** * on the Event Log.
Advanced ****************************************************************************** * IPMI 2.0 Configuration * View all events in the * * *************************************************** * BMC Event Log. * * Status Of BMC Working * * * * View BMC System Event Log * It will take up to * * Reload BMC System Event Log * 60 Seconds approx. * * Clear BMC System Event Log * to read all * * * LAN Configuration * BMC SEL records.
Power-On Self-Test (POST) The system BIOS provides a rudimentary power-on self-test. The basic devices required for the server to operate are checked, memory is tested, the LSI 1064 disk controller and attached disks are probed and enumerated, and the two Intel dual Gigabit Ethernet controllers are initialized. The progress of the self-test is indicated by a series of POST codes.
Redirecting Console Output Use the following instructions to access the service processor and redirect the console output so that the BIOS POST codes can be read. 1. Initialize the BIOS Setup utility by pressing the F2 key while the system is performing the power-on self-test (POST). The BIOS Main menu screen is displayed. 2. Select the Advanced menu tab. The Advanced Settings screen is displayed. 3. Select IPMI 2.0 Configuration. The IPMI 2.0 Configuration screen is displayed. 4.
10. Set the color depth for the redirection console at either 6 or 8 bits. 11. Click the Start Redirection button. 12. When you are prompted for a user name and password, type the following: ■ User Name: root ■ Password: changeme The current POST screen is displayed.
Changing POST Options These instructions are optional, but you can use them to change the operations that the server performs during POST testing. To change POST options: 1. Initialize the BIOS Setup utility by pressing the F2 key while the system is performing the power-on self-test (POST). The BIOS Main menu screen is displayed. 2. Select Boot. The Boot Settings screen is displayed.
3. Select Boot Settings Configuration. The Boot Settings Configuration screen is displayed. Boot ****************************************************************************** ** * Boot Settings Configuration * Allows BIOS to skip * * *************************************************** * certain tests while * * Quick Boot [Disabled] * booting. This will * * Quiet Boot [Disabled] * decrease the time * * AddOn ROM Display Mode [Force BIOS] * needed to boot the * * Bootup Num-Lock [On] * system.
30 ■ Boot Num-Lock – This option is On by default (keyboard Num-Lock is turned on during boot). If you set this to off, the keyboard Num-Lock is not turned on during boot. ■ Wait for F1 if Error – This option is disabled by default. If you enable this, the system will pause if an error is found during POST and will only resume when you press the F1 key. ■ Interrupt 19 Capture – This option is reserved for future use. Do not change.
POST Codes TABLE A-1 contains descriptions of each of the POST codes, listed in the same order in which they are generated. These POST codes appear as a four-digit string that is a combination of two-digit output from primary I/O port 80 and two-digit output from secondary I/O port 81. In the POST codes listed in TABLE A-1, the first two digits are from port 81 and the last two digits are from port 80.
TABLE A-1 POST Codes (Continued) Post Code Description de00 Preparing CPU for booting to OS by copying all of the context of the BSP to all application processors present. NOTE: APs are left in the CLI HLT state. 8613 Initialize PM regs and PM PCI regs at Early-POST. Initialize multi-host bridge, if system supports it. Setup ECC options before memory clearing. Enable PCI-X clock lines in the 8131. 0024 Uncompress and initialize any platform specific BIOS modules. 862a BBS ROM initialization.
POST Code Checkpoints The POST code checkpoints are the largest set of checkpoints during the BIOS preboot process. TABLE A-2 describes the type of checkpoints that might occur during the POST portion of the BIOS. These two-digit checkpoints are the output from primary I/O port 80. TABLE A-2 POST Code Checkpoints Post Code Description 03 Disable NMI, Parity, video for EGA, and DMA controllers. At this point, only ROM accesses go to the GPNV. If BB size is 64K, turn on ROM Decode below FFFF0000h.
TABLE A-2 POST Code Checkpoints (Continued) Post Code Description 0E Testing and initialization of different Input Devices. Also, update the Kernel Variables. Traps the INT09h vector, so that the POST INT09h handler gets control for IRQ1. Uncompress all available language, BIOS logo, and Silent logo modules. 13 Initialize PM regs and PM PCI regs at Early-POST, Initialize multi-host bridge, if system will support it. Setup ECC options before memory clearing.
TABLE A-2 POST Code Checkpoints (Continued) Post Code Description 60 Initializes NUM-LOCK status and programs the KBD typematic rate. 75 Initialize Int-13 and prepare for IPL detection. 78 Initializes IPL devices controlled by BIOS and option ROMs. 7A Initializes remaining option ROMs. 7C Generate and write contents of ESCD in NVRam. 84 Log errors encountered during POST. 85 Displays errors to the user and gets the user response for error. 87 Execute BIOS setup if needed/requested.
TABLE A-2 POST Code Checkpoints (Continued) Post Code Description B1 Save system context for ACPI. 00 Prepares CPU for booting to OS by copying all of the context of the BSP to all application processors present. NOTE: APs are left in the CLI HLT state. 61-70 OEM POST Error. This range is reserved for chipset vendors and system manufacturers. The error associated with this value may be different from one platform to the next.
APPENDIX B Status Indicator LEDs This appendix contains information about the locations and behavior of the LEDs on the server. It describes the external LEDs that can be viewed on the outside of the server and the internal LEDs that can be viewed only with the main cover removed. External Status Indicator LEDs See the following figures and tables for information about the LEDs that are viewable on the outside of the server. ■ FIGURE B-1 shows and describes the front panel LEDs.
Front Panel LEDs FIGURE B-1 Front Panel LEDs (X4140 shown) 1 4 2 5 6 3 Figure Legend 1 Locator LED/Locator button: White 4 Rear PS LED: (Amber) Power supply fault 2 Service Required LED: Amber 5 System Over Temperature LED: (Amber) 3 Power/OK LED: Green 6 Top Fan LED: (Amber) Service action required on fan(s) Back Panel LEDs FIGURE B-2 Back Panel LEDs (X4140 shown) 1 2 5 4 3 Figure Legend 1 Power Supply LEDs: 3 Service Required LED Power Supply OK: Green 4 Power OK LED Power
Hard Drive LEDs FIGURE B-3 Hard Drive LEDs 1 2 3 Figure Legend 1 Ready to remove LED: Blue – Service action is allowed 2 Fault LED: Amber – Service action is required 3 Status LED: Green – Blinks when data is being transferred Internal Status Indicator LEDs The server has internal status indicators on the motherboard, and on the mezzanine board. For motherboard locations, see FIGURE B-4. For mezzanine board locations, see FIGURE B-5.
Note – The mezzanine board, when present, obscures part of the motherboard, including the LEDs. The Motherboard Fault LED indicates that one or more of the LEDs on the motherboard is active.
FIGURE B-5 DIMMs and LEDs on Mezzanine Board Appendix B Status Indicator LEDs 41
42 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008
APPENDIX C Using the ILOM Service Processor GUI to View System Information This appendix contains information about using the Integrated Lights Out Manager (ILOM) Service processor (SP) GUI to view monitoring and maintenance information for your server.
Making a Serial Connection to the SP To make a serial connection to the SP: 1. Connect a serial cable from the RJ-45 Serial Management port on server to a terminal device. 2. Press ENTER on the terminal device to establish a connection between that terminal device and the ILOM SP. Note – If you are connecting to the serial port on the SP before it has been powered up or during its power-up sequence, you will see boot messages. The service processor eventually displays a login prompt.
Viewing ILOM SP Event Logs Events are notifications that occur in response to some actions. The IPMI system event log (SEL) provides status information about the server’s hardware and software to the ILOM software, which displays the events in the ILOM web GUI. To view event logs: 1. Log in to the SP as Administrator or Operator to reach the ILOM web GUI: a. Type the IP address of the server’s SP into your web browser. The Sun Integrated Lights Out Manager Login screen is displayed. b.
FIGURE C-1 System Event Logs Page 3. Select the category of event that you want to view in the log from the dropdown list box. You can select from the following types of events: 46 ■ Sensor-specific events. These events relate to a specific sensor for a component, for example, a fan sensor or a power supply sensor. ■ BIOS-generated events. These events relate to error messages generated in the BIOS. ■ System management software events.
After you have selected a category of event, the Event Log table is updated with the specified events. The fields in the Event Log are described in TABLE C-1. Event Log Fields TABLE C-1 Field Description Event ID The number of the event, in sequence from number 1. Time Stamp The day and time the event occurred. If the Network Time Protocol (NTP) server is enabled to set the SP time, the SP clock will use Universal Coordinated Time (UTC).
■ ILOM web GUI operation; for example, from the Maintenance tab, selecting Reset SP ■ An SP firmware upgrade After an SP reboot, the SP clock is changed by the following events: ■ When the host is booted. The host’s BIOS unconditionally sets the SP time to that indicated by the host’s RTC. The host’s RTC is set by the following operations: ■ When the host’s CMOS is cleared as a result of changing the host’s RTC battery or inserting the CMOS-clear jumper on the motherboard.
2. From the System Information tab, select Components. The Replaceable Component Information page is displayed. See FIGURE C-2. FIGURE C-2 Replaceable Component Information Page 3. Select a component from the drop-down list. Information about the selected component is displayed. 4. If the problem with the server is not evident after viewing replaceable component information, continue with “Running SunVTS Diagnostic Tests” on page 7.
Viewing Sensors This section describes how to view the server temperature, voltage, and fan sensor readings. For a complete list of sensors, see Appendix D. To view sensor readings: 1. Log in to the SP as Administrator or Operator to reach the ILOM web GUI: a. Type the IP address of the server’s SP into your web browser. The Sun Integrated Lights Out Manager Login screen is displayed. b. Type your user name and password.
FIGURE C-3 Sensor Readings Page 3. Click the Refresh button to update the sensor readings to their current status. 4. Click a sensor to display its thresholds. A display of properties and values appears. See the example in FIGURE C-4.
FIGURE C-4 Sensor Details Page 5. If the problem with the server is not evident after viewing sensor readings information, continue with “Running SunVTS Diagnostic Tests” on page 7.
APPENDIX D Error Handling This appendix contains information about how the servers process and log errors.
Note – If the error is on low 1MB, the BIOS freezes after rebooting. Therefore, no DMI log is recorded. ■ An example of the error reported by the SEL through IPMI 2.
FIGURE D-1 DMI Log Screen, Uncorrectable Error Appendix D Error Handling 55
Handling of Correctable Errors This section lists facts and considerations about how the server handles correctable errors. ■ 56 During BIOS POST: ■ The BIOS polls the MCK registers. ■ The BIOS logs to DMI. ■ The BIOS logs to the SP SEL through the BMC. ■ The feature is turned off at OS boot time by default.
FIGURE D-2 ■ DMI Log Screen, Correctable Error If during any stage of memory testing the BIOS finds itself incapable of reading/writing to the DIMM, it takes the following actions: ■ The BIOS disables the DIMM as indicated by the Memory Decreased message in the example in EXAMPLE D-1. ■ The BIOS logs an SEL record. ■ The BIOS logs an event in DMI.
EXAMPLE D-1 58 DMI Log Screen, Correctable Error, Memory Decreased Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008
Handling of Parity Errors (PERR) This section lists facts and considerations about how the server handles parity errors (PERR). ■ The handling of parity errors works through NMIs. ■ During BIOS POST, the NMI is logged in the DMI and the SP SEL. See the following example command and output: [root@d-mpk12-53-238 root]# ipmitool -H 129.146.53.
FIGURE D-3 ■ ■ Aug 5 05:15:00 on CPU 0. Aug 5 05:15:00 on CPU 1. Aug 5 05:15:00 Aug 5 05:15:00 enabled? Aug 5 05:15:00 on CPU 1. Aug 5 05:15:00 Aug 5 05:15:00 enabled? Aug 5 05:15:00 on CPU 0.
Note – The Linux system reboots, but does not inform the BIOS of this incident. Handling of System Errors (SERR) This section lists facts and considerations about how the server handles system errors (SERR). ■ System error handling works through the HyperTransport Synch Flood Error mechanism on 8111 and 8131. ■ The following events happen during BIOS POST: ■ POST reports any previous system errors at the bottom of screen. See FIGURE D-4 for an example.
EvM Revision Sensor Type Sensor Number Event Type Event Direction Event Data Description ■ : : : : : : : 04 Critical Interrupt 00 Sensor-specific Discrete Assertion Event 05ffff PCI SERR FIGURE D-5 shows an example DMI log screen from the BIOS Setup Page with a system error.
Handling Mismatching Processors This section lists facts and considerations about how the server handles mismatching processors. ■ The BIOS performs a complete POST. ■ The BIOS displays a report of any mismatching CPUs, as shown in the following example: AMIBIOS(C)2003 American Megatrends, Inc. BIOS Date: 08/10/05 14:51:11 Ver: 08.00.10 CPU : AMD Opteron(tm) Processor 254, Speed : 2.
Hardware Error Handling Summary TABLE D-1 summarizes the most common hardware errors that you might encounter with these servers. TABLE D-1 Hardware Error Handling Summary Logged (DMI Log or SP SEL) Fatal? Error Description Handling SP failure The SP fails to boot upon application of system power. The SP controls the system reset, so the system may power on, but will not come out of reset. • During power up, the SP's boot loader turns on the power LED.
TABLE D-1 Hardware Error Handling Summary (Continued) Logged (DMI Log or SP SEL) Error Description Handling Single-bit DRAM ECC error With ECC enabled in the BIOS Setup, the CPU detects and corrects a single-bit error on the DIMM interface. The CPU corrects the error in hardware. No SP SEL interrupt or machine check is generated by the hardware. The polling is triggered every half-second by SMI timer interrupts and is done by the BIOS SMI handler.
TABLE D-1 Hardware Error Handling Summary (Continued) Error Description Handling PCI SERR, PERR System or parity error on a PCI bus. Sync floods on HyperTransport links, the machine resets itself, and error information gets retained through reset. The BIOS reports, A Hyper Transport sync flood error occurred on last boot, press F1 to continue. BIOS POST Microcode Error The BIOS could not The BIOS displays an error message, logs the find or load the error to DMI, and boots.
TABLE D-1 Hardware Error Handling Summary (Continued) Logged (DMI Log or SP SEL) Error Description Multiple fan failure Fan failure is The Front Fan Fault, Service Action Required, SP SEL detected by reading and individual fan module LEDs are lit. tach signals. Fatal Single power supply failure When any of the AC/DC PS_VIN_GOOD or PS_PWR_OK signals are deasserted. DC/DC power Any converter POWER_GOOD failure signal is deasserted from the DC/DC converters.
68 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008
Index B BIOS changing POST options, 28 event logs, 21 POST code checkpoints, 33 POST codes, 31 POST overview, 25 redirecting console output for POST, 26 Bootable Diagnostics CD, 8 hardware errors, 64 mismatching processors, 63 parity errors, 59 system errors, 61 uncorrectable errors, 53 event logs, BIOS, 21 external inspection, 3 external LEDs, 37 F C comments and suggestions, x component inventory viewing with ILOM SP GUI, 48 console output, redirecting, 26 correctable errors, handling, 56 D diagnostic
external, 3 internal, 4 Integrated Lights-Out Manager Service Processor, See ILOM SP GUI internal inspection, 4 isolating DIMM ECC errors, 18 L LEDs external, 37 LEDs, ports, and slots illustrated, 38, 39 locations of ports, slots, and LEDs (illustration), 38, 39 Service Processor system event log, See SP SEL service visit information, gathering, 2 shutdown procedure, 4 slots, ports, and LEDs illustrated, 38, 39 SP event log viewing with ILOM SP GUI, 45 SP SEL time stamps, 47 SunVTS Bootable Diagnostics C