Technical data

ManualsBrandsCompaq ManualsNetwork CardHSZ80

HSZ80 Array Controller ACS Version 8.3

Maintenance and Service Guide

First Edition (December 1998)

Part Number EK-HSZ80-SV. A01/388221-001

Compaq Computer Corporation

Summary of content (336 pages)

PAGE 1
HSZ80 Array Controller ACS Version 8.3 Maintenance and Service Guide First Edition (December 1998) Part Number EK-HSZ80-SV.
PAGE 2
While Compaq Computer Corporation believes the information included in this manual is correct as of the date of publication, it is subject to change without notice. Compaq makes no representations that the interconnection of its products in the manner described in this document will not infringe existing or future patent rights, nor do the descriptions contained in this document imply the granting of licenses to make, use, or sell equipment or software in accordance with the description.
PAGE 3
JAPAN USA This equipment generates, uses, and may emit radio frequency energy. The equipment has been type tested and found to comply with the limits for a Class A digital device pursuant to Part 15 of FCC rules, which are designed to provide reasonable protection against such radio frequency interference. Operation of this equipment in a residential area may cause interference in which case the user at his own expense will be required to take whatever measures may be required to correct the interference.
PAGE 4
PAGE 5
v Contents About this Guide Getting Help. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii Compaq Website. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii Telephone Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii Precautions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 6
vi Preparation Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–2 Establishing a Local Connection to the Controller. . . . . . . . . . . . . . . . . . . . . . 2–2 Shutting Down the Subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–5 Disabling the External Cache Batteries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–5 Restarting the Subsystem. . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 7
vii Installing a Cache Module in a Dual-Redundant Controller Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–34 Replacing an External Cache Battery Storage Building Block . . . . . . . . . . . . . . 2–38 Replacing an External Cache Battery Storage Building Block With Cabinet Powered On . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 8
viii Deleting a Software Patch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–10 Listing Software Patches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–12 Upgrading Firmware on a Device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–14 Upgrading to a Dual-Redundant Controller Configuration . . . . . . . . . . . . . . . . . . 3–17 Installing a New Controller, Cache Module, and ECB . . . . . . . . . . . . . . . . .
PAGE 9
ix Instance Codes and Last-Failure Codes . . . . . . . . . . . . . . . . . . . . . . . . . 4–40 Controlling the Display of Significant Events and Failures . . . . . . . . . . 4–40 Using VTDPY to Check for Communication Problems . . . . . . . . . . . . . . . . 4–43 Checking Controller-to-Host Communications . . . . . . . . . . . . . . . . . . 4–45 Checking Controller-to-Device Communications . . . . . . . . . . . . . . . . . 4–47 Checking Device Type and Location . . . . . . . . . . . . . . . . . . . . .
PAGE 10
x Backup Battery Failure Event Sense Data Response . . . . . . . . . . . . . . . . . . . 5–11 Subsystem Built-In Self Test Failure Event Sense Data Response . . . . . . . . 5–13 Memory System Failure Event Sense Data Response . . . . . . . . . . . . . . . . . . 5–15 Device Services Non-Transfer Error Event Sense Data Response. . . . . . . . . 5–16 Disk Transfer Error Event Sense Data Response . . . . . . . . . . . . . . . . . . . . . . 5–18 Instance Codes . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 11
xi Chapter 6 Connectors, Switches, and LEDs Controller Front Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–2 Operator Control Panel LEDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–3 Power Verification and Addressing Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–4 Environmental Monitoring Unit (EMU) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 12
PAGE 13
xiii Figures The HSZ80 Subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–2 HSZ80 Array Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–4 Cache Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–6 EMU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 14
xiv Template 11 - Nonvolatile Parameter Memory Component Event Sense Data Response Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–10 Template 12 - Backup Battery Failure Event Sense Data Response Format . . . . 5–12 Template 13 - Subsystem Built-In Self Test Failure Event Sense Data Response Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 15
xv Tables The HSZ80 Subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–3 HSZ80 Fibre Channel Array Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–5 Cache Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–6 EMU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 16
xvi Device-Port Status Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–51 Unit Status Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–53 DILX Control Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–56 Data Patterns for Phase 1: Write Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–58 DILX Error Codes . . . . . . . . . . . . . . . . . .
PAGE 17
xvii About this Guide This book describes the features of the HSZ80 array controller and configuration procedures for the controller and storagesets running Array Controller Software (ACS) Version 8.3Z. This book does not contain information about the operating environments to which the controller may be connected, nor does it contain detailed information about subsystem enclosures or their components. See the documentation that accompanied these peripherals for information about them.
PAGE 18
xviii About this Guide Precautions Follow these precautions when carrying out the procedures in this book. Electrostatic Discharge Precautions Static electricity collects on all nonconducting material, such as paper, cloth, and plastic. An electrostatic discharge (ESD) can easily damage a controller or other subsystem component even though you may not see or feel the discharge.
PAGE 19
xix Maintenance Port Precautions The maintenance port generates, uses, and radiates radio-frequency energy through cables that are connected to it. This energy may interfere with radio and television reception. Do not leave a cable connected to this port when you’re not communicating with the controller. Compaq HSZ80 Array Controller ACS Version 8.
PAGE 20
xx About this Guide Conventions This book uses the following typographical conventions and special notices to help you find what you’re looking for. Typographical Conventions Convention ALLCAPS Meaning Command syntax that must be entered exactly as shown and for commands discussed within text, for example: SET FAILOVER COPY=OTHER_CONTROLLER “Use the SHOW SPARESET command to show the contents of the spareset.” Monospaced Sans serif italic Screen display.
PAGE 21
xxi Special Notices This book doesn’t contain detailed descriptions of standard safety procedures. However, it does contain warnings for procedures that could cause personal injury and cautions for procedures that could damage the controller or its related components. Look for these symbols when you’re carrying out the procedures in this book: WARNING: A warning indicates the presence of a hazard that can cause personal injury if you do not observe the precautions in the text.
PAGE 22
xxii About this Guide Required Tools You will need the following tools to service the controller, cache module, external cache battery (ECB), the Power Verification and Addressing (PVA) module, the Gigabit Link Module (GLM), and the I/O module: ■ A flathead screwdriver for loosening and tightening the I/O module retaining screws. ■ An antistatic wrist strap. ■ An antistatic mat on which to place modules during servicing.
PAGE 23
xxiii Related Publications The following table lists some of the Compaq StorageWorks documents related to the use of the controller, cache module, external cache battery, graphical user interface, and the subsystem. Document Title BA370 Enclosure Rack Template (Compaq 42U Rack) Command Console Version 2.
PAGE 24
xxiv About this Guide Document Title RA8000/ESA12000 HSZ80 ACS V8.3 for OpenVMS Installation Reference Manual RA8000/ESA12000 HSZ80 ACS V8.3 for OpenVMS Quick Setup Guide RA8000/ESA12000 Storage Subsystem User’s Guide Rail Mounting Installation Card (Compaq 42U Rack) Ultra SCSI RAID Enclosure (DS-BA370 Series) User’s Guide Warranty Terms and Conditions Revision History This is a new document.
PAGE 25
1–1 Chapter 1 General Description This chapter provides the illustrated parts breakdown and a spare list for the HSZ80 array controller subsystem. See for the names of referenced spare parts. Compaq HSZ80 Array Controller ACS Version 8.
PAGE 26
1–2 General Description System Components Exploded View 1 16 2 15 13 14 3 12 2x 4 11 10 9 2x 5 8 2x 6 7 CXO6742A Figure 1–1 The HSZ80 Subsystem
PAGE 27
1–3 Table 1–1 The HSZ80 Subsystem Item Description Part Number 1 BA370 rack-mountable enclosure 401914-001 2 Cooling fan, blue Cooling fan, gray 400293-001 402602-001 3 Power cable kit, white 401915-001 4 I/O module, blue I/O module, gray 400294-001 401911-001 5 SCSI hub, 3 port 401926-001 6 SCSI hub, 5 port 401927-001 7 SCSI hub, 9 port, upgrade NOTE: A complete 9-port SCSI hub requires a 5-port SCSI hub 401929-001 and 401927-001 8 Cache module 400295-001 9 HSZ80 controller 10
PAGE 28
1–4 General Description HSZ80 Array Controller 1 1 2 3 4 5 6 2 3 6 4 5 CXO6703A Figure 1–2 HSZ80 Array Controller
PAGE 29
1–5 Table 1–2 HSZ80 Fibre Channel Array Controller Item Description Part No. 1 Program card 103474-001 2 Trilink connectors 401948-001 3 Host bus cable, 1.5 meter Host bus cable, 2 meter Host bus cable, 10 meter Host bus cable, 15 meter Host bus cable, 20 meterr 401941-001 401940-001 401942-001 401943-001 401944-001 4 Terminator 401947-001 5 Jumper cable 401939-001 6 Maintenance port cable 402605-001 Compaq HSZ80 Array Controller ACS Version 8.
PAGE 30
1–6 General Description Cache Module 1 ~ 2 CXO6570A Figure 1–3 Cache Module Table 1–3 Cache Module Item 1 2 Description Part No.
PAGE 31
1–7 Environmental Monitoring Unit (EMU) 1 CXO6604A Figure 1–4 EMU Table 1–4 EMU Item 1 Description EMU communication cable, 4 meter Part No. 401949-001 Compaq HSZ80 Array Controller ACS Version 8.
PAGE 32
PAGE 33
2–1 Chapter 2 Replacement Procedures This chapter describes the procedures for replacing the controller, cache module, external cache battery (ECB), power verification and addressing (PVA) module, I/O module, environmental monitoring unit (EMU), DIMMs, PCMCIA card, and a failed storageset member. Additionally, there are procedures for shutting down and restarting the subsystem. See the enclosure documentation for information about the power supplies, cooling fans, and cables.
PAGE 34
2–2 Replacement Procedures Preparation Procedures Establishing a Local Connection to the Controller You can communicate with a controller locally or remotely. Use a local connection to configure the controller for the first time. Use a remote connection to your host system for all subsequent configuration tasks. See the Quick Setup Guide that came with your platform kit for details.
PAGE 35
2–3 1 2 3 4 5 6 1 2 3 4 5 6 7 CXO6584A Figure 2–1. PC/Terminal to Maintenance Port Connection Compaq HSZ80 Array Controller ACS Version 8.
PAGE 36
2–4 Replacement Procedures Table 2–1 Description of PC/Terminal to Maintenance Port Connection Location Description ➀ Maintenance port cable for a PC ➁ Maintenance Port Optional maintenance port cable for a terminal connection ➂ BC16E-xx cable assembly ➃ Ferrite bead ➄ RJ-11 adapter ➅ RJ-11 extension cable ➆ PC serial port adapter, 9 pin D-sub to 25 pin D-sub CAUTION: The cables connecting the controller and the PC (or terminal) may cause radio and television interference.
PAGE 37
2–5 Shutting Down the Subsystem Use the following steps to shut down a subsystem: 1. From a host console, stop all host activity and dismount the logical units in the subsystem. 2. Connect a PC or terminal to the maintenance port of one of the controllers in your subsystem. 3. Shut down the controllers. In single controller configurations, you only need to shut down “this controller.
PAGE 38
2–6 Replacement Procedures 1. Press the battery-disable switch located on each battery within the ECB SBB. The switch is the small button labeled SHUT OFF next to the status LED (see Figure 2–2). Press each switch for approximately five seconds. The status LED will flash once and then shut off. Make sure you perform this procedure on both ECB 1 and ECB 2, if appropriate. 2. The batteries are no longer powering the cache module. 1 3 4 5 2 CXO6164B Figure 2–2.
PAGE 39
2–7 NOTE: To return to normal operation, apply power to the storage subsystem. The cache battery will be enabled when the subsystem is powered on. Restarting the Subsystem Use the following steps to restart a subsystem: 1. Plug in the subsystem’s power cord, if it is not already plugged in. 2. Turn on the subsystem. The controllers automatically restart and the ECBs automatically re-enable themselves to provide backup power to the cache modules. Compaq HSZ80 Array Controller ACS Version 8.
PAGE 40
2–8 Replacement Procedures Replacing Modules in a Single-Controller Configuration Follow the instructions in this section to replace modules in a single-controller configuration (see Figure 2–3). If you’re replacing modules in a dual-redundant controller configuration, see “Replacing Modules in a Dual-Redundant Controller Configuration,” page 2–16. To upgrade a single controller configuration to a dual redundant controller configuration, see Chapter 3, “Upgrading the Subsystem.
PAGE 41
2–9 The following sections cover procedures for replacing both the controller and cache module, replacing the controller, and replacing the cache module. CAUTION: In a single-controller configuration, you must shut down the subsystem before removing or replacing any modules. If you remove the controller or any other module without first shutting down the subsystem, data loss may occur.
PAGE 42
2–10 Replacement Procedures 3. Run FMU to obtain the last failure codes, if desired. NOTE: If you initialized a container with the SAVE_ CONFIGURATION switch, you can save this controller’s current device configuration using the CONFIGURATION SAVE command. If CONFIGURATION SAVE is not used, you will have to manually configure the new controller as described in HSZ80 ACS Version 8.3 Configuration and CLI Reference Guide. 4.
PAGE 43
2–11 Installing the Controller in a Single-Controller Configuration Use the following steps to install the controller: CAUTION: ESD can easily damage a controller. Wear a snug-fitting, grounded ESD wrist strap. Make sure you align the controller in the appropriate guide rails. If you do not align the module correctly, damage to the backplane can occur. 1. Insert the new controller into its slot, and engage its retaining levers. 2. Connect the trilink connectors to the new controller.
PAGE 44
2–12 Replacement Procedures 8. Using CLCP, install any patches that you had installed on the previous controller (see Chapter 3, “Upgrading the Subsystem.”) 9. Mount the logical units on the host. If you are using a Windows NT platform, restart the server. 10. Set the subsystem date and time with the following command: SET THIS_CONTROLLER TIME=dd-mmm-yyyy:hh:mm:ss 11. Disconnect the PC or terminal from the controller’s maintenance port.
PAGE 45
2–13 Replacing a Cache Module in a SingleController Configuration Use the following steps in “Removing the Cache Module in a Single-Controller Configuration” and “Installing the Cache Module in a Single-Controller Configuration” to replace the cache module. Removing the Cache Module in a Single-Controller Configuration Use the following steps to remove the cache module: 1. From the host console, dismount the logical units in the subsystem. If you are using a Windows NT platform, shut down the server. 2.
PAGE 46
2–14 Replacement Procedures 7. Disengage both retaining levers, remove the cache module, and place the cache module into an antistatic bag or onto a grounded antistatic mat. NOTE: Remove the DIMMs from the cache module. They will be installed in the replacement cache module. 8. Press down on the DIMM retaining levers at either end of the DIMM you want to remove. 9. Grasp the DIMM and gently remove it from the DIMM slot. Repeat for all DIMMs.
PAGE 47
2–15 6. If not already connected, connect a PC or terminal to the controller’s maintenance port. 7. Restart the controller by pressing its reset button. 8. When the CLI prompt reappears, display details about the controller you configured. Use the following command: SHOW THIS_CONTROLLER FULL 9. Mount the logical units on the host. If you are using a Windows NT platform, restart the server. 10.
PAGE 48
2–16 Replacement Procedures Replacing Modules in a Dual-Redundant Controller Configuration Follow the instructions in this section to replace modules in a dual-redundant controller configuration (see Figure 2–4). If you’re replacing modules in a single controller configuration, see “Replacing Modules in a Single-Controller Configuration,” page 2–8. 1 2 3 6 4 5 CXO6291B Figure 2–4.
PAGE 49
2–17 The following sections cover procedures for replacing both the controller and cache module, replacing the controller, and replacing the cache module. Note the following before starting the replacement procedures: ■ The new controller’s hardware must be compatible with the functioning controller’s hardware. See the product-specific release notes that accompanied the software release for information regarding hardware compatibility.
PAGE 50
2–18 Replacement Procedures 4. Start FRUTIL with the following command: RUN FRUTIL FRUTIL displays the following: Do you intend to replace this controller’s cache battery? Y/N 5. Enter N(o). FRUTIL displays the FRUTIL Main menu: FRUTIL Main Menu: 1. Replace or remove a controller or cache module 2. Install a controller or cache module 3. Replace a PVA module 4. Replace an I/O module 5. Exit Enter choice: 1, 2, 3, 4, or 5 -> 6.
PAGE 51
2–19 8. Enter Y(es) and press return. FRUTIL displays the following: Quiescing all device ports. Please wait... Device Port 1 quiesced. Device Port 2 quiesced. Device Port 3 quiesced. Device Port 4 quiesced. Device Port 5 quiesced. Device Port 6 quiesced. All device ports quiesced. Remove the slot A [ or B] controller (the one without a blinking green LED) within 4 minutes. CAUTION: The device ports must quiesce before removing the controller.
PAGE 52
2–20 Replacement Procedures 12. Disable the ECB by pressing the battery disable switch until the status light stops blinking—about five seconds. CAUTION: The ECB must be disabled—the status light is not lit and is not blinking—before disconnecting the ECB cable from the cache module. Failure to disable the ECB could result in cache module damage. 13.
PAGE 53
2–21 16. Grasp the DIMM and gently remove it from the DIMM slot. Repeat for all DIMMs. Installing a Controller and its Cache Module in a DualRedundant Controller Configuration Use the following steps to install a controller and its cache module. CAUTION: ESD can easily damage a controller, cache module, or DIMM. Wear a snug-fitting, grounded ESD wrist strap. 1.
PAGE 54
2–22 Replacement Procedures 7. Enter option 2, Install a controller or cache module, from the FRUTIL Main menu. FRUTIL displays the Install Options menu: Install Options: 1. Other controller and cache module 2. Other controller module 3. Other cache module 4. Exit Enter choice: 1, 2, 3, or 4 -> 8. Enter option 1, Other controller and cache module, from the Install Options menu. FRUTIL display the following: Insert both the slot A [or B ] controller and cache module? Y/N 9.
PAGE 55
2–23 CAUTION: ESD can easily damage a controller or a cache module. Wear a snug-fitting, grounded ESD wrist strap. 10. Disable the ECB to which you’re connecting the new cache module by pressing the battery disable switch until the status light stops blinking—about five seconds. CAUTION: The ECB must be disabled—the status light is not lit and is not blinking—before connecting the ECB cable to the cache module. Failure to disable the ECB could result in ECB damage.
PAGE 56
2–24 Replacement Procedures 14. Connect the trilink connectors with host bus cables (or terminators) to the new controller. NOTE: One or two trilink connectors with host bus cables (or terminators) may be attached, depending on the configuration. 15. Press return to continue. FRUTIL will exit. If the other controller did not restart, follow these steps: a. Press and hold the other controller’s reset button. b. Insert the other controller’s program card. c. Release the reset button. 16.
PAGE 57
2–25 Replacing a Controller in a Dual-Redundant Controller Configuration Use the following steps in “Removing a Controller in a Dual-Redundant Controller Configuration” and “Installing a Controller in a Dual-Redundant Controller Configuration” to replace a controller. Removing a Controller in a Dual-Redundant Controller Configuration Use the following steps to remove a controller: 1. Connect a PC or terminal to the operational controller’s maintenance port.
PAGE 58
2–26 Replacement Procedures 6. Enter option 1, Replace or remove a controller or cache module, from the FRUTIL Main menu. FRUTIL displays the Replace or Remove Options menu: Replace or remove Options: 1. Other controller and cache module 2. Other controller module 3. Other cache module 4. Exit Enter choice: 1, 2, 3, or 4 -> 7. Enter option 2, Other controller module, from the Replace or Remove Options menu.
PAGE 59
2–27 CAUTION: The device ports must quiesce before removing the controller. Failure to allow the ports to quiesce may result in data loss. Quiescing may take several minutes. ESD can easily damage a controller. Wear a snug-fitting, grounded ESD wrist strap. NOTE: A countdown timer allows a total of two minutes to remove the controller. If you exceed two minutes, “this controller” will exit FRUTIL and resume operations. If this happens, return to step 4. 9.
PAGE 60
2–28 Replacement Procedures Installing a Controller in a Dual-Redundant Controller Configuration Use the following steps to install a controller: 1. Connect a PC or terminal to the operational controller’s maintenance port. The controller to which you’re connected is “this controller”; the controller that you’re installing is the “other controller.” 2. Start FRUTIL with the following command: RUN FRUTIL FRUTIL displays the following: Do you intend to replace this controller’s cache battery? Y/N 3.
PAGE 61
2–29 6. Enter Y(es) and press return. FRUTIL displays the following: Quiescing all device ports. Please wait... Device Port 1 quiesced. Device Port 2 quiesced. Device Port 3 quiesced. Device Port 4 quiesced. Device Port 5 quiesced. Device Port 6 quiesced. All device ports quiesced. . . . Insert the controller module, WITH its program card, in slot A [ or B] within x minutes, xx seconds. NOTE: A countdown timer allows a total of two minutes to install the controller.
PAGE 62
2–30 Replacement Procedures 8. Connect the trilink connectors with host bus cables (or terminators) to the new controller. NOTE: One or two trilink connectors with host bus cables (or terminators) may be attached, depending on the configuration. 9. Press return to continue. FRUTIL will exit. If the other controller did not restart, follow these steps: a. Press and hold the other controller’s reset button. b. Insert the other controller’s program card. c. Release the reset button. 10.
PAGE 63
2–31 Replacing a Cache Module in a DualRedundant Controller Configuration Use the following steps in “Removing a Cache Module in a Dual-Redundant Controller Configuration” and “Installing a Cache Module in a Dual-Redundant Controller Configuration” to replace a cache module. NOTE: The new cache module must contain the same memory configuration as the cache module it’s replacing. Removing a Cache Module in a Dual-Redundant Controller Configuration Use the following steps to remove a cache module: 1.
PAGE 64
2–32 Replacement Procedures 5. Enter option 1, Replace or remove a controller or cache module, from the FRUTIL Main menu. FRUTIL displays the Replace or Remove Options menu: Replace or remove Options: 1. Other controller and cache module 2. Other controller module 3. Other cache module 4. Exit Enter choice: 1, 2, 3, or 4 -> 6. Enter option 3, Other cache module, from the Replace or Remove Options menu.
PAGE 65
2–33 CAUTION: The device ports must quiesce before removing the cache module. Failure to allow the ports to quiesce may result in data loss. Quiescing may take several minutes. ESD can easily damage the cache module or a DIMM. Wear a snugfitting, grounded ESD wrist strap. NOTE: A countdown timer allows a total of two minutes to remove the cache module. If you exceed two minutes, “this controller” will exit FRUTIL and resume operations. If this happens, return to step 3. 8.
PAGE 66
2–34 Replacement Procedures 11. Enter N(o) if you don’t have a replacement cache module; FRUTIL will exit. Disconnect the PC or terminal from the controller’s maintenance port. Enter Y(es) if you have a replacement cache module and want to install it now. FRUTIL displays the following: Insert the slot A [or B] cache module? Y/N NOTE: Remove the DIMMs from the cache module. They will be installed in the replacement cache module. 12.
PAGE 67
2–35 3. Enter N(o). FRUTIL displays the FRUTIL Main menu: FRUTIL Main Menu: 1. Replace or remove a controller or cache module 2. Install a controller or cache module 3. Replace a PVA module 4. Replace an I/O module 5. Exit Enter choice: 1, 2, 3, 4, or 5 -> 4. Enter option 2, Install a controller or cache module, from the FRUTIL Main menu. FRUTIL displays the Install Options menu: Install Options: 1. Other controller and cache module 2. Other controller module 3. Other cache module 4.
PAGE 68
2–36 Replacement Procedures 9. Enter Y(es) and press return. FRUTIL displays the following: Quiescing all device ports. Please wait... Device Port 1 quiesced. Device Port 2 quiesced. Device Port 3 quiesced. Device Port 4 quiesced. Device Port 5 quiesced. Device Port 6 quiesced. All device ports quiesced. . . . Perform the following steps: 1. Turn off the battery for the new cache module by pressing the battery’s shut off button for five seconds 2. Connect the battery to the new cache module. 3.
PAGE 69
2–37 NOTE: In mirrored mode, FRUTIL will initialize the mirrored portion of the new cache module, check for old data on the cache module, and then restart all device ports. After the device ports have been restarted, FRUTIL will test the cache module and the ECB. After the test completes, the device ports will quiesce and a mirror copy of the cache module data will be created on the newly installed cache module. 13. FRUTIL will restart the other controller. FRUTIL displays the following: Please wait . . .
PAGE 70
2–38 Replacement Procedures Replacing an External Cache Battery Storage Building Block The External Cache Battery (ECB) Storage Building Block (SBB) can be replaced with cabinet power on or off. An ECB SBB is shown in Figure 2–5. The singlebattery configuration contains one battery and the dual-battery configuration contains two batteries. 1 2 US STAT F OF UT SH E CH CA R WE PO E CH CA R WE PO US STAT F OF UT SH 4 3 ~ CXO5713A Figure 2–5.
PAGE 71
2–39 Replacing an External Cache Battery Storage Building Block With Cabinet Powered On Use the following steps to replace the ECB SSB with the cabinet powered on: NOTE: The procedure for a dual-redundant controller configuration assumes that a single ECB SBB with a dual battery is installed and an empty slot is available for the replacement ECB SBB. If an empty slot is not available, place the new ECB SBB on the top of the enclosure.
PAGE 72
2–40 Replacement Procedures NOTE: If an empty slot is not available, place the new ECB SBB on the top of the enclosure. 5. Connect the new battery to the unused end of the Y cable attached to cache A [or B] 6. Disconnect the old battery. Do not wait for the new battery’s status light to turn solid green. 7. Press return. FRUTIL displays the following: Updating this battery’s expiration date and deep discharge history. Field Replacement Utility terminated. 8.
PAGE 73
2–41 3. Shut down the controllers. In single-controller configurations, shut down “this controller.” In dual-redundant controller configurations, shut down the “other controller” first, then shut down “this controller” with the following commands: SHUTDOWN OTHER_CONTROLLER SHUTDOWN THIS_CONTROLLER When the controllers shut down, their reset buttons and their first three LEDs are lit continuously.
PAGE 74
2–42 Replacement Procedures 10. Type Y(es). FRUTIL displays the following: If the batteries were replaced while the cabinet was powered down, press return. Otherwise follow this procedure: WARNING: Ensure that at least one battery is connected to the Y cable at all times during this procedure. 1.Connect the new battery to the unused end of the ’Y’ cable attached to cache A [or B]. 2.Disconnect the old battery. Do not wait for the new battery’s status light to turn solid green. 3.Press return. 11.
PAGE 75
2–43 Replacing a PVA Module Use the following steps to replace a PVA module in the master enclosure (ID 0), the first expansion (ID 2), or second expansion enclosure (ID 3). The master enclosure contains the controllers and the cache modules. NOTE: This procedure is not applicable for the M1 shelf. The HSZ80 controller can support up to three enclosures: the master enclosure, the first expansion enclosure, and the second expansion enclosure.
PAGE 76
2–44 Replacement Procedures 5. Enter option 3, Replace a PVA module from the FRUTIL Main menu. FRUTIL displays the PVA Replacement menu: FRUTIL PVA Replacement Menu: 1. Master Enclosure (ID 0) 2. First Expansion Enclosure (ID 2) 3. Second Expansion Enclosure (ID 3) 4. Exit Enter Choice: 1, 2, 3, or 4 -> NOTE: The HSZ80 controller supports up to three enclosures. The FRUTIL PVA Replacement Menu has options for three enclosures regardless of how many enclosures are connected. 6.
PAGE 77
2–45 11. Press return to resume device port activity and restart the other controller. When all port activity has restarted, FRUTIL displays the following: PVA replacement complete. Please wait . . . If the other controller did not restart, press its reset button. Field Replacement Utility terminated. 12. If the other controller did not restart, press its reset button. 13.
PAGE 78
2–46 Replacement Procedures Replacing an I/O Module Figure 2–6 shows a rear view of the BA370 enclosure and the relative location of the six I/O modules (also referred to as ports). Figure 2–7 shows the six I/O modules and the location of the connectors and securing screws. Use the following steps to replace an I/O module: NOTE: This procedure is not applicable for the M1 shelf. An I/O module can be replaced in either a single-controller or a dual-redundant controller configuration using this procedure.
PAGE 79
2–47 6 4 2 5 3 1 CXO5819A Figure 2–7. I/O Module Locations NOTE: The controller can function with one failed I/O module. 1. Connect a PC or terminal to the controller’s maintenance port. 2. In a dual-redundant controller configuration, disable failover with the following command: SET NOFAILOVER 3. Start FRUTIL with the following command: RUN FRUTIL FRUTIL displays the following: Do you intend to replace this controller’s cache battery? Y/N 4. Enter N(o).
PAGE 80
2–48 Replacement Procedures 5. Enter option 4, Replace an I/O module, from the FRUTIL Main menu. In the following example, cabinet 0, port 5 is missing or bad.
PAGE 81
2–49 13. If the other controller did not restart, press its reset button. 14. Enable failover and re-establish the dual-redundant configuration with the following command: SET FAILOVER COPY=THIS_CONTROLLER This command copies the subsystem’s configuration from “this controller” to the “other controller.” 15. Disconnect the PC or terminal from the controller’s maintenance port. Compaq HSZ80 Array Controller ACS Version 8.
PAGE 82
2–50 Replacement Procedures Replacing an EMU Use the following steps in and to replace the EMU. Removing an EMU 1. From a host console, stop all host activity and dismount the logical units in the subsystem. 2. Connect a PC or terminal to the maintenance port of one of the controllers in your subsystem. 3. Shut down the controllers. In single controller configurations, you only need to shut down “this controller.
PAGE 83
2–51 Installing an EMU CAUTION: ESD can easily damage an EMU. Wear a snug-fitting, grounded ESD wrist strap. Make sure you align the EMU in the appropriate guide rails. If you do not align the EMU correctly, damage to the backplane can occur 1. Align the EMU in the top, left-hand slot and insert it. 2. Insert the EMU into its slot until the extractor latches engage the enclosure, then Engage its retaining levers to secure the EMU. 3. If there are no expansion enclosures, go to step 6.
PAGE 84
2–52 Replacement Procedures Replacing DIMMs Use the following steps in “Removing DIMMs” and “Installing DIMMs” to replace DIMMs in a cache module. The cache module may be configured as shown in Figure 2–8 and Table 2–7. 3 1 4 2 CXO6576A Figure 2–8.
PAGE 85
2–53 CAUTION: ESD can easily damage a cache module or a DIMM. Wear a snug-fitting, grounded ESD wrist strap. Removing DIMMs Use the following steps to remove a DIMM from a cache module: 1. Remove the cache module using the steps in either “Removing the Cache Module in a Single-Controller Configuration,” page 2–13, or “Removing a Cache Module in a Dual-Redundant Controller Configuration,” page 2–31. 2. Press down on the DIMM retaining levers at either end of the DIMM you want to remove. 3.
PAGE 86
2–54 Replacement Procedures 1 2 3 CXO6577A Figure 2–9.
PAGE 87
2–55 Replacing a PCMCIA Card Use the following steps to replace a PCMCIA (program) card (see Figure 2–10): 1 2 3 4 1 2 3 4 5 6 5 CXO6585A Figure 2–10. PCMCIA Card Table 2–9 PCMCIA Card Location Description ➀ Controller ➁ Program-card slot ➂ Program-card ejection button ➃ Program card ➄ ESD/PCMCIA card cover CAUTION: The new PCMCIA card must have the same software version as the PCMCIA card being replaced. See Chapter 3, “Upgrading the Subsystem,” for more information.
PAGE 88
2–56 Replacement Procedures 1. From a host console, stop all host activity and dismount the logical units in the subsystem. 2. Connect a maintenance PC or terminal to one of the controllers’ maintenance port in your subsystem. 3. Shut down the controllers. In single-controller configurations, shut down “this controller.
PAGE 89
2–57 Replacing a Failed Storageset Member If a disk drive fails in a RAIDset or mirrorset, the controller automatically places it into the failedset. If the spareset contains a replacement drive that satisfies the storageset’s replacement policy, the controller automatically replaces the failed member with the replacement drive. If the spareset is empty or doesn’t contain a satisfactory drive, the controller simply “reduces” the storageset so that it can operate without one of its members.
PAGE 90
2–58 Replacement Procedures Installing the New Member Use the following steps to install a new member: 1. Insert a new disk drive that satisfies the replacement policy of the reduced storageset into the PTL location of the failed disk drive. NOTE: The controller automatically initializes the new disk drive and places it into the spareset. As soon as it becomes a member of the spareset, the controller automatically uses the new disk drive to restore the reduced RAIDset or mirrorset.
PAGE 91
3–1 Chapter 3 Upgrading the Subsystem This chapter provides instructions for upgrading the controller software, installing software patches, upgrading firmware on a device, upgrading from a single-controller configuration to a dual-redundant controller configuration, and upgrading cache memory. Required Tools You will need the following tools to service the controller, cache module, and the external cache battery (ECB): ■ An antistatic wrist strap.
PAGE 92
3–2 Upgrading the Subsystem Upgrading Controller Software You can upgrade the controller’s software two ways: ■ Install a new program card (see Figure 3–1) that contains the new software. ■ Download a new software image, and use the menu-driven Code Load/Code Patch (CLCP) utility to write it onto the existing program card. You may also use this utility to install, delete, and list patches to the controller software. 1 2 3 4 1 2 3 4 5 6 5 CXO6585A Figure 3–1.
PAGE 93
3–3 Installing a New Program Card Use the following steps to install a program card that contains the new software. If you’re only upgrading the software in a single-controller configuration, disregard references to the “other controller” and read the plural controllers as the singular controller. To upgrade the software by installing a new program card: 1. From the host console, dismount the storage units in the subsystem. 2. Connect a PC or terminal to one of the controllers’ maintenance port. 3.
PAGE 94
3–4 Upgrading the Subsystem 8. In a dual-redundant controller configuration, repeat steps 4 through 7 for the “other controller.” 9. Mount the storage units on the host. Downloading New Software Use the CLCP to download new software to the program card while it’s installed in the controller. Use the following steps to upgrade the software with CLCP: 1. Obtain the new software image file from a customer service representative.
PAGE 95
3–5 Write protected Write CXO5873A Figure 3–2. Location of Write-Protection Switch 6. Start CLCP with the following command: RUN CLCP CLCP displays the following: Select an option from the following list: Code Load & Patch local program Main Menu 0: Exit 1: Enter Code LOAD local program 2: Enter Code PATCH local program 3: Enter EMU Code LOAD Utility Enter option number (0..3) [0] ? Compaq HSZ80 Array Controller ACS Version 8.
PAGE 96
3–6 Upgrading the Subsystem 7. Enter option 1, Enter Code LOAD local program, from the CLCP Main menu to start the Code LOAD local program. CLCP displays the following: You have selected the Code Load Utility. This utility is used to load a new software image into the program card currently inserted in the controller. Type ^Y or ^C (then RETURN) at any time to abort code load.
PAGE 97
3–7 10. Enter option 2, Use the Maintenance Terminal Port, from the menu. CLCP displays the following: Perform the following steps before continuing: * get new image file on serial line host computer * configure KERMIT with the following parameters: terminal speed 19200 baud, eight bit, no parity, 1 stop bit It will take approximately 35 to 45 minutes to perform the code load operation. WARNING: proceeding with Controller Code Load will overwrite the current Controller code image with a new image.
PAGE 98
3–8 Upgrading the Subsystem Using CLCP to Install, Delete, and List Software Patches Use CLCP to manage software patches. These small programming changes are placed into the controller’s non-volatile memory and become active as soon you restart the controller. There is space for about ten patches, depending upon the size of the patches you’re installing. Keep the following points in mind while installing or deleting patches: ■ Patches are associated with specific software versions.
PAGE 99
3–9 CLCP displays the following: Select an option from the following list: Code Load & Patch local program Main Menu 0: Exit 1: Enter Code LOAD local program 2: Enter Code PATCH local program 3: Enter EMU Code LOAD utility Enter option number 5. (0..3) [0] ? Enter option 2, Enter Code PATCH local program. CLCP displays the following: You have selected the Code Patch local program. This program is used to manage software code patches.
PAGE 100
3–10 Upgrading the Subsystem 7. Enter Y(es) and follow the on-screen prompts. 8. After the patch is installed, press the controller’s reset button to restart the controller. Deleting a Software Patch Use the following steps to delete a software patch: 1. From a host console, quiesce all port activity. 2. Connect a PC or terminal to the controller’s maintenance port. 3.
PAGE 101
3–11 5. Enter option 2, Delete Patches, to delete patches. CLCP displays the following: This is the Delete Patches option. The program prompts you for the software version and patch number you wish to delete. If you select a patch for deletion that is required for another patch, all dependent patches are also selected for deletion. The program lists your deletion selections and asks if you wish to continue. Type ^Y or ^C (then RETURN) at any time to abort Code Patch.
PAGE 102
3–12 Upgrading the Subsystem Listing Software Patches Use the following steps to list software patches: 1. Connect a PC or terminal to the controller’s maintenance port. 2. Start CLCP with the following command: RUN CLCP CLCP displays the following: Select an option from the following list: Code Load & Patch local program Main Menu 0: Exit 1: Enter Code LOAD local program 2: Enter Code PATCH local program 3: Enter EMU Code LOAD utility Enter option number 3. (0..
PAGE 103
3–13 4. Enter option 3, List Patches, to list patches. CLCP displays the following: The following patches are currently stored in the patch area: Software Version - Patch number(s) xxxx xxxx Code Patch Main Menu 0: Exit 1: Enter a Patch 2: Delete Patches 3: List Patches Enter option number (0..3) [0] ? 5. Enter option 0, Exit. Compaq HSZ80 Array Controller ACS Version 8.
PAGE 104
3–14 Upgrading the Subsystem Upgrading Firmware on a Device Use HSUTIL to upgrade a device with firmware located in contiguous blocks at a specific LBN on a source disk drive configured as a unit on the same controller. Upgrading firmware on a disk is a two-step process as shown in Figure 3–3. First, copy the new firmware from your host to a disk drive configured as a unit in your subsystem, then use HSUTIL to load the firmware onto the devices in the subsystem.
PAGE 105
3–15 ■ HSUTIL cannot install firmware on devices that have been configured as single disk drive units or as members of a storageset, spareset, or failedset. If you want to install firmware on a device that has previously been configured as a single disk drive, delete the unit number and storageset name associated with it. ■ During the installation, the source disk drive is not available for other subsystem operations.
PAGE 106
3–16 Upgrading the Subsystem HSUTIL displays the following: HSUTIL Main Menu: 0. Exit 1. Disk Format 2. Disk Device Code Load 3. Tape Device Code Load 4. Disaster Tolerance Backend Controller Code Load Enter function number: (0:4) [0]? 5. Enter option 2, Disk Device Code Load, from the HSUTIL menu. 6. Choose the single-disk unit as the source disk for the download. 7. Enter the starting LBN of the firmware image—usually LBN 0. 8. Enter the product ID of the device you want to upgrade.
PAGE 107
3–17 Upgrading to a Dual-Redundant Controller Configuration Use the following steps to upgrade a single-configuration subsystem to a dualredundant configuration subsystem. To replace failed components, see Chapter 2, “Replacement Procedures,” for more information.
PAGE 108
3–18 Upgrading the Subsystem 3. Enter N(o). FRUTIL displays the FRUTIL Main menu: FRUTIL Main Menu: 1. Replace or remove a controller or cache module 2. Install a controller or cache module 3. Replace a PVA module 4. Replace an I/O module 5. Exit Enter choice: 1, 2, 3, 4, or 5 -> 4. Enter option 2, Install a controller or cache module, from the FRUTIL Main menu. FRUTIL displays the Install Options menu: Install Options: 1. Other controller and cache module 2. Other controller module 3.
PAGE 109
3–19 6. Enter Y(es) and press return. FRUTIL displays the following: Quiescing all device ports. Please wait... Device Port 1 quiesced. Device Port 2 quiesced. Device Port 3 quiesced. Device Port 4 quiesced. Device Port 5 quiesced. Device Port 6 quiesced. All device ports quiesced. . . . Perform the following steps: 1. Turn off the battery for the new cache module by pressing the battery’s shut off button for five seconds. 2. Connect the battery to the new cache module. 3.
PAGE 110
3–20 Upgrading the Subsystem CAUTION: The ECB must be disabled—the status light is not lit and is not blinking—before connecting the ECB cable to the cache module. Failure to disable the ECB could result in the ECB being damaged. Make sure you align the cache module and controller in the appropriate guide rails. If you do not align the modules correctly, damage to the backplane can occur. 9. Connect the ECB cable to the new cache module. 10.
PAGE 111
3–21 13. Press return to continue. If the other controller did not restart, follow these steps: a. Press and hold the other controller’s reset buttons. b. Insert the other controller’s program card. c. Release the reset button. 14. See HSZ80 ACS Version 8.3 Configuration and CLI Reference Guide to configure the controller. NOTE: If the controller you’ve installed was previously used in another subsystem, it will need to be purged of the controller’s old configuration (see HSZ80 ACS Version 8.
PAGE 112
3–22 Upgrading the Subsystem Upgrading Cache Memory The cache module may be configured as shown in Figure 3–4 and Table 3–2. 3 1 4 2 CXO6576A Figure 3–4.
PAGE 113
3–23 In order to upgrade cache memory, the controller must be shut down. Use the following steps to upgrade or add DIMMs: 1. From the host console, dismount the logical units in the subsystem. If you are using a Windows NT platform, shut down the server. 2. If the controller is operating, connect a PC or terminal to the controller’s maintenance port. 3. Shut down the controllers. In single controller configurations, shut down “this controller.
PAGE 114
3–24 Upgrading the Subsystem 1 2 3 CXO6577A Figure 3–5.
PAGE 115
3–25 8. If you are replacing DIMMs, press down on the DIMM retaining levers at either end of the DIMM you want to remove. 9. Grasp the DIMM and gently remove it from the DIMM slot. 10. Insert the replacement DIMM straight into the socket and ensure that the notches in the DIMM align with the tabs in the socket (see Figure 3–5). 11. In a dual-redundant controller configuration, repeat steps 4 through 10, as appropriate, for the other cache module.
PAGE 116
PAGE 117
4–1 Chapter 4 Troubleshooting This chapter provides guidelines for troubleshooting the controller, cache module, and external cache battery (ECB). It also describes the utilities and exercisers that you can use to aid in troubleshooting these components. See Chapter 5, “Event Reporting: Templates and Codes,” for a list of the event codes.
PAGE 118
4–2 Troubleshooting Running the Controller’s Diagnostic Test During start up, the controller automatically tests its device ports, host port, cache module, and value-added functions. If you’re experiencing intermittent problems with one of these components, you can run the controller’s diagnostic test in a continuous loop, rather than restarting the controller over and over again. Use the following steps to run the controller’s diagnostic test: 1. Connect a terminal to the controller’s maintenance port.
PAGE 119
4–3 This four-minute polling continues for up to 10 hours—the maximum time it should take to recharge the batteries. If the batteries have not been charged sufficiently after 10 hours, the controller declares them to be failed. Battery Hysteresis When charging a battery, write-back caching will be allowed as long as a previous down time has not drained more than 50 percent of a battery’s capacity.
PAGE 120
4–4 Troubleshooting Troubleshooting Checklist The following checklist provides a general procedure for diagnosing the controller and its supporting modules. If you follow this checklist, you’ll be able to identify many of the problems that occur in a typical installation. When you’ve identified the problem, use Table 4–1 to confirm your diagnosis and fix the problem. If your initial diagnosis points to several possible causes, use the tools described later in this chapter to further refine your diagnosis.
PAGE 121
4–5 If the controller has failed to the extent that it cannot support a local terminal for FMU, check the host’s error log for the instance or last-failure codes. See Chapter 5, “Event Reporting: Templates and Codes,” to interpret the event codes. 7. Check the status of the devices with the following command: SHOW DEVICES FULL Look for errors such as “misconfigured device” or “No device at this PTL.
PAGE 122
4–6 Troubleshooting Troubleshooting Table Use the troubleshooting checklist that begins on page 4–4 to find a symptom, then use this table to verify and fix the problem. Table 4–1 Troubleshooting Table Symptom Reset button not lit Possible Cause No power to subsystem. Investigation Check power to subsystem and power Remedy Replace cord or AC input power module. supplies on controller’s shelf. Reset button lit steadily; other LEDs also lit. Ensure that all cooling fans are installed.
PAGE 123
4–7 Table 4–1 Troubleshooting Table (Continued) Symptom Possible Cause Investigation Remedy Reset button blinking; other LEDs also lit. Device in error or FAIL set on corresponding device port with other LEDs lit. SHOW device FULL Follow repair action. Cannot set failover to create dual-redundant configuration. Incorrect command syntax. See HSZ80 Array Controller ACS Version 8.3 Configuration and CLI Reference Guide for the SET FAILOVER command. Use the correct command syntax.
PAGE 124
4–8 Troubleshooting Table 4–1 Troubleshooting Table (Continued) Symptom Possible Cause Investigation Remedy Controller previously set for failover. Ensure that neither controller is configured for failover. Use the SET NOFAILOVER command on both controllers, then reset “this controller” for failover. Failed controller. If the foregoing checks fail to produce a remedy, check for OCP LED codes. Follow repair action. Node ID is all zeros. SHOW THIS_CONTROLLER to see if node ID is all zeros.
PAGE 125
4–9 Table 4–1 Troubleshooting Table (Continued) Symptom Possible Cause Investigation Remedy Nonmirrored cache; controller reports failed DIMM in cache module A or B. Improperly installed DIMM. Remove cache module and ensure that DIMM is fully seated in its slot. Reseat DIMM. Failed DIMM. If the foregoing check fails to produce a remedy, check for OCP LED codes. Replace DIMM. Mirrored cache; “this controller” reports DIMM 1 or 2 failed in cache module A or B.
PAGE 126
4–10 Troubleshooting Table 4–1 Troubleshooting Table (Continued) Symptom Mirrored cache; controller reports cache or mirrored cache has failed. Possible Cause Investigation Primary data and its mirrored copy data are not identical. SHOW THIS_CONTROLLER indicates that the cache or mirrored cache has failed. Spontaneous FMU message displays: “Primary cache declared failed - data inconsistent with mirror,” or “Mirrored cache declared failed - data inconsistent with primary.
PAGE 127
4–11 Table 4–1 Troubleshooting Table (Continued) Symptom Invalid cache. Possible Cause Mirrored-cache mode discrepancy. This may occur after you’ve installed a new controller. Its existing cache module is set for mirrored caching, but the new controller is set for unmirrored caching. (It may also occur if the new controller is set for mirrored caching but its existing cache module is not.) Investigation SHOW THIS_CONTROLLER indicates “invalid cache.
PAGE 128
4–12 Troubleshooting Table 4–1 Troubleshooting Table (Continued) Symptom Possible Cause Cache module may erroneously contain unflushed write-back data. This may occur after you’ve installed a new controller. Its existing cache module may indicate that it contains unflushed write-back data, but the new controller expects to find no data in the existing cache module. (This error may also occur if you install a new cache module for a controller that expects write-back data in the cache.
PAGE 129
4–13 Table 4–1 Troubleshooting Table (Continued) Symptom Cannot add device. Possible Cause Investigation Remedy Illegal device. See product-specific release notes that accompanied the software release for the most recent list of supported devices. Replace device. Device not properly installed in shelf. Check that SBB is fully seated. Firmly press SBB into slot. Failed device. Check for presence of device LEDs. Follow repair action in the documentation provided with the enclosure or device.
PAGE 130
4–14 Troubleshooting Table 4–1 Troubleshooting Table (Continued) Symptom Cannot configure storagesets. Can’t assign unit number to storageset. Possible Cause Investigation Remedy Incorrect command syntax. See HSZ80 Array Controller ACS Version 8.3 Configuration and CLI Reference Guide for the ADD storageset command. Reconfigure storageset with correct command syntax. Exceeded maximum number of storagesets. Use the SHOW command to count the number of storagesets configured on the controller.
PAGE 131
4–15 Table 4–1 Troubleshooting Table (Continued) Symptom Possible Cause Investigation Remedy Unit is available but not online. This is normal. Units are “available” until the host accesses them, at which point their status is changed to “online.” None. None. Host cannot see device. Broken cables or a missing, incorrect, or defective terminator. Check for broken cables or a missing, incorrect, or defective terminator. Replace broken cablesor the missing, incorrect, or defective terminator.
PAGE 132
4–16 Troubleshooting Table 4–1 Troubleshooting Table (Continued) Symptom Possible Cause Investigation Remedy Host’s log file or maintenance terminal indicates that a forced error occurred when the controller was reconstructing a RAIDset or mirrorset. Unrecoverable read errors may have occurred when controller was reconstructing the storageset. Errors occur if another member fails while the controller is reconstructing the storageset.
PAGE 133
4–17 Fault-Tolerance for Write-Back Caching The cache module supports nonvolatile memory and dynamic cache policies to protect the availability of its unwritten (write-back) data. Nonvolatile Memory Except for disaster-tolerant supported mirrorsets, the controller can provide writeback caching for storage units as long as the controller’s cache memory is nonvolatile.
PAGE 134
4–18 Troubleshooting Table 4–2 Cache Policies and Cache Module Status Cache Module Status Cache A Good Multibit cache memory failure Cache B Good Good Cache Policy Unmirrored Cache Mirrored Cache Data loss: No. Data loss: No. Cache policy: Both controllers support write-back caching. Cache policy: Both controllers support write-back caching. Failover: No. Failover: No. Data loss: Forced error and loss of write-back data for which the multibit error occurred.
PAGE 135
4–19 Table 4–2 Cache Policies and Cache Module Status (Continued) Cache Module Status Cache A DIMM or cache memory controller chip failure Cache B Good Cache Policy Unmirrored Cache Mirrored Cache Data integrity: Write-back data that was not written to media when failure occurred was not recovered. Data integrity: Controller A recovers all of its write-back data from the mirrored copy on cache B. Cache policy: Controller A supports write-through caching only; controller B supports write-back caching.
PAGE 136
4–20 Troubleshooting Table 4–2 Cache Policies and Cache Module Status (Continued) Cache Module Status Cache A Cache Board Failure Cache B Good Cache Policy Unmirrored Cache Same as for DIMM failure. Mirrored Cache Data integrity: Controller A recovers all of its write-back data from the mirrored copy on cache B. Cache policy: Both controllers support write-through caching only. Controller B cannot execute mirrored writes because cache module A cannot mirror controller B’s unwritten data. Failover: No.
PAGE 137
4–21 Table 4–3 Resulting Cache Policies and ECB Status (Continued) Cache Module Status Cache A Less than 50% charged Cache B At least 50% charged Cache Policy Unmirrored Cache Mirrored Cache Data loss: No. Data loss: No. Cache policy: Controller A supports write-through caching only; controller B supports write-back caching. Cache policy: Both controllers continue to support write-back caching. Failover: No. Failover: In transparent failover, all units failover to controller B.
PAGE 138
4–22 Troubleshooting Table 4–3 Resulting Cache Policies and ECB Status (Continued) Cache Module Status Cache A Failed Cache B At least 50% charged Cache Policy Unmirrored Cache Mirrored Cache Data loss: No. Data loss: No. Cache policy: Controller A supports write-through caching only; controller B supports write-back caching. Cache policy: Both controllers continue to support write-back caching. Failover: No.
PAGE 139
4–23 Table 4–3 Resulting Cache Policies and ECB Status (Continued) Cache Module Status Cache A Failed Cache B Less than 50% charged Cache Policy Unmirrored Cache Mirrored Cache Data loss: No. Data loss: No. Cache policy: Both controllers support write-through caching only. Cache policy: Both controllers support write-through caching only. Failover: In transparent failover, all units failover to controller B and operate normally. Failover: No.
PAGE 140
4–24 Troubleshooting Significant Event Reporting The controller’s fault-management software reports information about significant events that occur. These events are reported via the: ■ Maintenance terminal ■ Host error log ■ Operator control panel (OCP) Some events cause controller operation to terminate; others allow the controller to remain operable. Each of these two instances are detailed in the following sections.
PAGE 141
4–25 NOTE: If the reset button is flashing and an LED is lit continuously, either the devices on that LED’s bus don’t match the controller’s configuration, or an error has occurred in one of the devices on that bus. Also, a single LED that is lit indicates a failure of the drive on that port. Flashing OCP Pattern Display Reporting Certain events can cause an alternating display of the OCP LEDs. These patterns are described in Table 4–4.
PAGE 142
4–26 Troubleshooting Table 4–4 Flashing OCP Patterns (Continued) Pattern OCP Code Error Repair Action ■❍●❍❍●● 13 Controller Module memory parity is not working. Replace controller. ■❍●❍●❍❍ 14 Controller Module memory controller timer has failed. Replace controller. ■❍●●❍❍● 15 The Controller Module memory controller interrupt handler has failed. Replace controller.
PAGE 143
4–27 Table 4–4 Flashing OCP Patterns (Continued) Pattern OCP Code Error Repair Action ■●●●●❍● 3D There was an unexpected maskable interrupt during initialization. Replace controller. ■●●●●●❍ 3E There was an unexpected NMI during initialization. Replace controller. ■●●●●●● 3F An invalid process ran during initialization. Replace controller. Solid OCP Pattern Display Reporting Some events cause a steady pattern to be displayed in the OCP LEDs, as described in Table 4–5.
PAGE 144
4–28 Troubleshooting Table 4–5 Solid OCP Patterns (Continued) Pattern ■●●●●❍❍ OCP Code 3C Error NVPM write loop hang. Repair Action Replace controller. Attempt to write data to NVPM failed. ■●●●❍●● 3B NVPM read loop hang. Replace controller. Attempt to read data from NVPM failed. ■●●●❍●❍ 3A An unexpected NMI occurred during Last Failure processing. Reset controller. Last Failure processing interrupted by a Non-Maskable Interrupt (NMI). ■●●●❍❍● 39 NVPM configuration inconsistent.
PAGE 145
4–29 Table 4–5 Solid OCP Patterns (Continued) Pattern ■●●❍❍●● OCP Code 33 Error NVPM structure revision too low. NVPM structure revision number is less than the one that can be handled by the software version attempting to be executed. ■●●❍❍●❍ 32 Code load program card write failure. Repair Action Verify that the program card contains the latest software version. If the error persists, replace controller. Replace card. Attempt to update program card failed.
PAGE 146
4–30 Troubleshooting Table 4–5 Solid OCP Patterns (Continued) Pattern ■●❍●●●❍ OCP Code Error 2E Multiple cabinets have the same SCSI ID. More than one cabinet have the same SCSI ID . ■●❍●●❍● 2D All master cabinet SCSI buses are not set to ID 0. Repair Action Reconfigure PVA ID to uniquelyidentify each cabinet in the subsystem. The cabinet with the controllers must be set to PVA ID 0; additional cabinets must use PVA IDs 2 and 3.
PAGE 147
4–31 Table 4–5 Solid OCP Patterns (Continued) Pattern ■●❍●❍●❍ OCP Code 2A Error All cabinet IO modules are not of the same type. Cabinet I/O modules are a combination of single-sided and differential. ■●❍●❍❍● 29 EMU protocol version incompatible The microcode in the EMU and the software in the controller are not compatible.
PAGE 148
4–32 Troubleshooting Table 4–5 Solid OCP Patterns (Continued) Pattern ■❍❍❍❍❍❍ OCP Code 0 Error No program card detected or kill asserted by other controller. Controller unable to read program card. ❏❍❍❍❍❍❍ 0 Catastrophic controller or power failure. Repair Action Ensure that program card is properly seated while resetting the controller. If the error persists, try the card with another controller; or replace the card. Otherwise, replace the controller that reported the error. Check power.
PAGE 149
4–33 Last Failure Reporting Last Failures are displayed on the maintenance terminal using %LFL formatting. The example below details an occurrence of a Last Failure report: %LFL--HSZ> --13-JAN-1946 04:39:45 (time not set)-- Last Failure Code: 20090010 Power On Time: 0.Years, 14.Days, 19.Hours, 58.Minutes, 42.
PAGE 150
4–34 Troubleshooting Spontaneous Event Log Spontaneous event logs are displayed on the maintenance terminal using %EVL formatting, as illustrated in the following examples: %EVL--HSZ> --13-JAN-1946 04:32:47 (time not set)-- Instance Code: 0102030A (not yet reported to host) Template: 1.(01) Power On Time: 0.Years, 14.Days, 19.Hours, 58.Minutes, 43.
PAGE 151
4–35 CLI Event Reporting CLI event reports are displayed on the maintenance terminal using %CER formatting, as shown in the following example: %CER--HSZ> --13-JAN-1946 04:32:20 (time not set)-- Previous controlleroperation terminated with display of solid fault code, OCP Code: 3F HSZ> Compaq HSZ80 Array Controller ACS Version 8.
PAGE 152
4–36 Troubleshooting Utilities and Exercisers The controller’s software includes the utilities and exercisers to assist in troubleshooting and maintaining the controller and the other modules that support its operation. Fault Management Utility The Fault Management Utility (FMU) provides a limited interface to the controller’s fault-management software.
PAGE 153
4–37 Displaying Failure Entries The controller stores the 16 most recent last-failure reports as entries in its nonvolatile memory. The occurrence of any failure event will terminate operation of the controller on which it occurred. NOTE: Memory system failures are reported via the last failure mechanism but can be displayed separately. Use the following steps to display the last-failure entries: 1. Connect a PC or a local terminal to the controller. 2. Start FMU with the following command: RUN FMU 3.
PAGE 154
4–38 Troubleshooting The following example shows a last-failure entry. The Informational Report—the lower half of the entry—contains the instance code, reporting component, and so forth that you can translate with FMU to learn more about the event. Last Failure Entry: 4. Flags: 006FF300 Template: 1.(01) Description: Last Failure Event Power On Time: 0. Years, 14. Days, 19. Hours, 51. Minutes, 31.
PAGE 155
4–39 Translating Event Codes Use the following steps to translate the event codes in the fault-management reports for spontaneous events and failures: 1. Connect a PC or a local terminal to the controller’s maintenance port. 2. Start FMU with the following command: RUN FMU 3. Show one or more of the entries with the following command: DESCRIBE code_type code# where code_type is one of those listed in Table 4–6 and code# is the alphanumeric value displayed in the entry.
PAGE 156
4–40 Troubleshooting The following example shows the FMU translation of a last-failure code. FMU>DESCRIBE LAST_FAILURE_CODE 206C0020 Last Failure Code: 206C0020 Description: Controller was forced to restart in order for new controller code image to take effect. Reporting Component: 32.(20) Description: Command Line Interpreter Reporting component’s event number: 108.(6C) Restart Type: 2.
PAGE 157
4–41 Table 4–7 describes various SET commands that you can enter while running FMU. These commands remain in effect only as long as the current FMU session remains active, unless you enter the PERMANENT qualifier—the last entry in Table 4–7. Table 4–7 FMU SET Commands Command SET EVENT_LOGGING SET NOEVENT_LOGGING Result enable and disable the spontaneous display of significant events to the local terminal; preceded by “%EVL.” By default, logging is enabled (SET EVENT_LOGGING).
PAGE 158
4–42 Troubleshooting Table 4–7 FMU SET Commands (Continued) Command Result SET log_type VERBOSE SET log_type NOVERBOSE enable and disable the automatic translation of event codes that are contained in event logs or last-failure logs. By default, this descriptive text is not displayed (SET log_type NOVERBOSE). See “Translating Event Codes,” page 4–39, for instructions to translate these codes manually.
PAGE 159
4–43 Table 4–7 FMU SET Commands (Continued) Command Result SET CLI_EVENT_REPORTING SET NOCLI_EVENT_REPORTING enable and disable the asynchronous errors reported at the CLI prompt (for example, “swap signals disabled” or “shelf has a bad power supply”). Preceded by “%CER.” By default, these errors are reported (SET CLI_EVENT_REPORTING). These errors are cleared with the CLEAR ERRORS_CLI command. SET FAULT_LED_LOGGING enable and disable the solid fault LED event log display on the local terminal.
PAGE 160
4–44 Troubleshooting ■ The state and I/O activity of the logical units, devices, and device ports in the subsystem Use the following steps to run VTDPY: 1. Connect a terminal to the controller. The terminal must support ANSI control sequences. 2. Set the terminal to NOWRAP mode to prevent the top line of the display from scrolling off of the screen. 3. Start VTDPY with the following command: RUN VTDPY Use the key sequences and commands liosted in Table 4–8 to control VTDPY.
PAGE 161
4–45 You may abbreviate the commands to the minimum number of characters necessary to identify the command. Enter a question mark (?) after a partial command to see the values that can follow the supplied command. For example, if you enter DISP ?, the utility will list CACHE, DEFAULT, and so forth. (Separate “DISP” and “?” with a space.) Upon successfully executing a command—other than HELP—VTDPY exits command mode. Pressing Return without a command also causes VTDPY to exit command mode.
PAGE 162
4–46 Troubleshooting Table 4–9 lists the heading and contents for each column of the Xfer Rate region (indicated by bold text in Figure 4–1). Table 4–9 Xfer Rate Columns Column Contents T SCSI target ID. W Transfer width: W for 16-bit; blank for 8-bit. I Initiator that negotiated synchronous communication. MHz Synchronous data rate negotiated by the initiator at the specified SCSI ID number.
PAGE 163
4–47 Checking Controller-to-Device Communications Use the VTDPY display device to see how or if the controller is communicating with the devices in the subsystem (see Figure 4–2). This display contains three important regions: ■ Device map region (upper left) ■ Device status region (upper right) ■ Device-port status region (lower left) VTDPY>DISPLAY DEVICE HSZ80 S/N: 0000000000 SW: 00000-0 0.
PAGE 164
4–48 Troubleshooting Checking Device Type and Location The device map region of the device display (upper left) shows all of the devices that the controller recognizes through its device ports. Table 4–10 lists the heading and contents for each column of the device map region. Table 4–10 Device Map Columns Column Port Target Contents SCSI ports 1 through 6. SCSI targets 0 through 15. Single controllers occupy 7; dual-redundant controllers occupy 6 and 7.
PAGE 165
4–49 Table 4–11 Device Status Columns Column PTL A S Contents Kind of device and its port-target-lun (PTL) location: D = disk drive P = passthrough device ? = unknown device type = no device at this port/target location Availability of the device: A = available to this controller a = available to other controller U = unavailable, but configured on “this controller” u = unavailable, but configured on “other controller” = unknown availability state Spindle state of the device: ^
PAGE 166
4–50 Troubleshooting Table 4–11 Device Status Columns (Continued) Column Contents RdKB/S Average data transfer rate from the device (reads) during the last update interval. WrKB/S Average data transfer rate to the device (writes) during the last update interval. Que Maximum number of I/O requests waiting to be transferred to the device during the last update interval. Tg Maximum number of requests queued to the device during the last update interval.
PAGE 167
4–51 Table 4–12 Device-Port Status Columns Column Contents Port SCSI device ports 1 through 6. Rq/S Average request rate for the port during the last update interval. Requests can be up to 32K and generated by host or cache activity. RdKB/S Average data transfer rate from the devices on the port (reads) during the last update interval. WrKB/S Average data transfer rate to the devices on the port (writes) during the last update interval.
PAGE 168
4–52 Troubleshooting VTDPY>DISPLAY CACHE HSZ80 S/N: CX13245768 SW: RDGMZ-0 0.
PAGE 169
4–53 Table 4–13 Unit Status Columns Column Unit A S Contents Kind of unit (and its unit number): D = disk drive or CD-ROM drive P = passthrough device ? = unknown device type Availability of the unit: a = available to other controller d = disabled for servicing, offline e = mounted for exclusive access by a user f = media format error i = inoperative m = maintenance mode for diagnostic purposes o = online. Host may access this unit through “this controller.
PAGE 170
4–54 Troubleshooting Table 4–13 Unit Status Columns (Continued) Column Contents W Write-protection state. For disk drives, a W in this column indicates that the device is hardware write-protected. This column is blank for units that comprise other kinds of devices.
PAGE 171
4–55 Table 4–13 Unit Status Columns (Continued) Column Contents BlChd Number of blocks added to the cache during the last update interval. BlHit Number of blocks hit during the last update interval. RH% Read cache-hit percentage for data transferred between the host and the unit.. Disk Inline Exerciser (DILX) Checking for Disk-Drive Problems Use the disk inline exerciser (DILX) to check the data-transfer capability of disk drives.
PAGE 172
4–56 Troubleshooting 4. Enter the following command to turn off the LED: LOCATE CANCEL Testing the Read Capability of a Disk Drive Use the following steps to test the read capability of a disk drive: 1. From a host console, dismount the logical unit that contains the disk drive you want to test. 2. Connect a terminal to the maintenace port of the controller that accesses the disk drive you want to test. 3. Run DILX with the following command: RUN DILX 4.
PAGE 173
4–57 Testing the Read and Write Capabilities of a Disk Drive Run a DILX Basic Function test to test the read and write capability of a disk drive. During the Basic Function test, DILX runs the following four tests. (DILX repeats the last three tests until the time that you specify in step 6 on page 4-59 expires.) ■ Write test. Writes specific patterns of data to the disk drive (see Table 4–15.) DILX does not repeat this test. ■ Random I/O test.
PAGE 174
4–58 Troubleshooting Table 4–15 Data Patterns for Phase 1: Write Test Pattern Pattern in Hexadecimal Numbers 1 0000 2 8B8B 3 3333 4 3091 5 0001, 0003, 0007, 000F, 001F, 003F, 007F, 00FF, 01FF, 03FF, 07FF, 0FFF, 1FFF, 3FFF, 7FFF 6 FIE, FFFC, FFFC, FFFC, FFE0, FFE0, FFE0, FFE0, FE00, FC00, F800, F000, F000, C000, 8000, 0000 7 0000, 0000, 0000, FFFF, FFFF, FFFF, 0000, 0000, FFFF, FFFF, 0000, FFFF, 0000, FFFF, 0000, FFFF 8 B6D9 9 5555, 5555, 5555, AAAA, AAAA, AAAA, 5555, 5555, AAAA, AAAA, 5
PAGE 175
4–59 Use the following steps to test the read and write capabilities of a specific disk drive: 1. From a host console, dismount the logical unit that contains the disk drive you want to test. 2. Connect a terminal to the maintenance port of the controller that accesses the disk drive you want to test. 3. Run DILX with the following command: RUN DILX 4. Decline the auto-configure option so that you can specify the disk drive to test.
PAGE 176
4–60 Troubleshooting 18. Perform the initial write pass. 19. Allow DILX to compare the read and write data. 20. Accept the default percentage of reads and writes that DILX compares. 21. Enter the unit number of the disk drive you want to test. For example, if you want to test D107, enter the number 107. 22. If you want to test more than one disk drive, enter the appropriate unit numbers when prompted, otherwise, enter “n” to start the test.
PAGE 177
4–61 HSUTIL Use HSUTIL to upgrade the firmware on disk drives in the subsystem and to format disk drives. See Chapter 3, “Upgrading Firmware on a Device,” page 3–14, for more infomration on using HSUTIL. While you are formatting disk drives or installing new firmware, HSUTIL may produce one or more of the messages in Table 4–17 (many of the self-explanatory messages have been omitted).
PAGE 178
4–62 Troubleshooting Table 4–17 HSUTIL Messages and Inquiries (Continued) Message Description What BUFFER SIZE, (in BYTES), does the drive require (2048, 4096, 8192) [8192]? HSUTIL detects that an unsupported device has been selected as the target device and the firmware image requires multiple SCSI Write Buffer commands.You must specify the number of bytes to be sent in each Write Buffer command. The default buffer size is 8192 bytes.
PAGE 179
4–63 Clone Utility Use the Clone utility to duplicate the data on any unpartitioned single-disk unit, stripeset, mirrorset, or striped mirrorset. Back up the cloned data while the actual storageset remains online. When the cloning operation is done, you can back up the clones rather than the storageset or single-disk unit, which can continue to service its I/O load. When you are cloning a mirrorset, CLONE does not need to create a temporary mirrorset.
PAGE 180
4–64 Troubleshooting Change Volume Serial Number Utility NOTE: Only COMPAQ authorized service personnel may use this utility. The Change Volume Serial Number (CHVSN) utility generates a new volume serial number (called VSN) for the specified device and writes it on the media. It is a way to eliminate duplicate volume serial numbers and to rename duplicates with different volume serial numbers.
PAGE 181
5–1 Chapter 5 Event Reporting: Templates and Codes This appendix describes the event codes that the fault-management software generates for spontaneous events and last-failure events. The HSZ80 controller uses various codes to report different types of events, and these codes are presented in template displays.
PAGE 182
5–2 Event Reporting: Templates and Codes Passthrough Device Reset Event Sense Data Response Events reported by passthrough devices during host/device operations are conveyed directly to the host system without intervention or interpretation by the HSZ80 controller, with the exception of device sense data that is truncated to 160 bytes when it exceeds 160 bytes.
PAGE 183
5–3 Last Failure Event Sense Data Response Unrecoverable conditions detected by either software or hardware and certain operator-initiated conditions result in the termination of HSZ80 controller operation. In most cases, following such a termination, the controller will attempt to restart (that is, reboot) with hardware components and software data structures initialized to the states necessary to perform normal operations (see Figure 5–2).
PAGE 184
5–4 Event Reporting: Templates and Codes off bit 0 7 6 5 Unusd 1 2 4 3 2 1 0 Error Code Unused Sense Key Unused 3-6 Unused 7 Additional Sense Length 8-11 Unused 12 Additional Sense Code (ASC) 13 Additional Sense Code Qualifier (ASCQ) 14 Unused 15–17 Unused 18–31 Reserved 32-35 Instance Code 36 Template 37 Template Flags 38–53 Reserved 54–69 Controller Board Serial Number 70–73 Controller Software Revision Level 74–75 Reserved 76 LUN Status 77–103 Reserved 10
PAGE 185
5–5 Multiple-Bus Failover Event Sense Data Response The HSZ80 SCSI Host Interconnect Services software component reports Multiple Bus Failover events via the Multiple Bus Failover Event Sense Data Response (see Figure 5–3). ■ Instance Codes (byte offset 32-35) are described in Table 5–1, “Instance Codes.” ■ ASC and ASCQ codes (byte offsets 12 and 13) are described in Table 5–7, “ASC and ASCQ Codes.” Compaq HSZ80 Array Controller ACS Version 8.
PAGE 186
5–6 Event Reporting: Templates and Codes off bit 0 7 6 5 Unusd 1 2 4 3 2 1 0 Error Code Unused Unused Sense Key 3-6 Unused 7 Additional Sense Length 8-11 Unused 12 Additional Sense Code (ASC) 13 Additional Sense Code Qualifier (ASCQ) 14 Unused 15–17 Unused 18–26 Reserved 27 Failed Controller Target Number 28–31 Affected LUNs 32–35 Instance Code 36 Template 37 Template Flags 38–53 Other Controller Board Serial Number 54–69 Controller Board Serial Number 70–73 Co
PAGE 187
5–7 Failover Event Sense Data Response The HSZ80 controller Failover Control software component reports errors and other conditions encountered during redundant controller communications and failover operation via the Failover Event Sense Data Response (see Figure 5–4). ■ Instance Codes (byte offset 32-35) are described in Table 5–1, “Instance Codes.” ■ ASC and ASCQ codes (byte offsets 12 and 13) are described in Table 5–7, “ASC and ASCQ Codes” on page –104.
PAGE 188
5–8 Event Reporting: Templates and Codes off bit 0 7 6 5 Unusd 1 2 4 3 2 1 0 Error Code Unused Unused Sense Key 3–6 Unused 7 Additional Sense Length 8–11 Unused 12 Additional Sense Code (ASC) 13 Additional Sense Code Qualifier (ASCQ) 14 Unused 15–17 Unused 18–31 Reserved 32–35 Instance Code 36 Template 37 Template Flags 38–53 Reserved 54–69 Controller Board Serial Number 70–73 Controller Software Revision Level 74–75 Reserved 76 LUN Status 77-103 Reserved 10
PAGE 189
5–9 Nonvolatile Parameter Memory Component Event Sense Data Response The HSZ80 controller Executive software component reports errors detected while accessing a Nonvolatile Parameter Memory Component via the Nonvolatile Parameter Memory Component Event Sense Data Response (see Figure 5–5). ■ Instance Codes (byte offset 32-35) are described in Table 5–1, “Instance Codes.” ■ ASC and ASCQ codes (byte offsets 12 and 13) are described in Table 5–7, “ASC and ASCQ Codes.
PAGE 190
5–10 Event Reporting: Templates and Codes off bit 0 7 6 5 Unusd 1 2 4 3 2 1 0 Error Code Unused Unused Sense Key 3-6 Unused 7 Additional Sense Length 8-11 Unused 12 Additional Sense Code (ASC) 13 Additional Sense Code Qualifier (ASCQ) 14 Unused 15–17 Unused 18–31 Reserved 32-35 Instance Code 36 Template 37 Template Flags 38–53 Reserved 54–69 Controller Board Serial Number 70–73 Controller Software Revision Level 74–75 Reserved 76 LUN Status 77-103 Reserved 1
PAGE 191
5–11 Backup Battery Failure Event Sense Data Response The HSZ80 controller Value Added Services software component reports backup battery failure conditions for the various hardware components that use a battery to maintain state during power failures via the Backup Battery Failure Event Sense Data Response (see Figure 5–6). ■ Instance Codes (byte offset 32-35) are described in Table 5–1, “Instance Codes.” ■ ASC and ASCQ codes (byte offsets 12 and 13) are described in Table 5–7, “ASC and ASCQ Codes.
PAGE 192
5–12 Event Reporting: Templates and Codes off bit 0 7 6 5 Unusd 1 2 4 3 2 1 0 Error Code Unused Unused Sense Key 3-6 Unused 7 Additional Sense Length 8-11 Unused 12 Additional Sense Code (ASC) 13 Additional Sense Code Qualifier (ASCQ) 14 Unused 15–17 Unused 18–31 Reserved 32-35 Instance Code 36 Template 37 Template Flags 38–53 Reserved 54–69 Controller Board Serial Number 70–73 Controller Software Revision Level 74-75 Reserved 76 LUN Status 77–103 Reserved 1
PAGE 193
5–13 Subsystem Built-In Self Test Failure Event Sense Data Response The HSZ80 controller Subsystem Built-In Self Tests software component reports errors detected during test execution via the Subsystem Built-In Self Test Failure Event Sense Data Response (see Figure 5–7). ■ Instance Codes (byte offset 32-35) are described in Table 5–1, “Instance Codes.” ■ ASC and ASCQ codes (byte offsets 12 and 13) are described in Table 5–7, “ASC and ASCQ Codes.” Compaq HSZ80 Array Controller ACS Version 8.
PAGE 194
5–14 Event Reporting: Templates and Codes off bit 0 7 6 5 Unusd 1 2 4 3 2 1 0 Error Code Unused Unused Sense Key 3-6 Unused 7 Additional Sense Length 8-11 Unused 12 Additional Sense Code (ASC) 13 Additional Sense Code Qualifier (ASCQ) 14 Unused 15–17 Unused 18–31 Reserved 32-35 Instance Code 36 Template 37 Template Flags 38–53 Reserved 54–69 Controller Board Serial Number 70–73 Controller Software Revision Level 74–75 Reserved 76 LUN Status 77-103 Reserved 1
PAGE 195
5–15 Memory System Failure Event Sense Data Response The HSZ80 controller Memory Controller Event Analyzer software component and the Cache Manager, part of the Value Added software component, report the occurrence of memory errors via the Memory System Failure Event Sense Data Response (see Figure 5–8). off bit 0 7 ■ Instance Codes (byte offset 32-35) are described in Table 5–1, “Instance Codes.” ■ ASC and ASCQ codes (byte offsets 12 and 13) are described in Table 5–7, “ASC and ASCQ Codes.
PAGE 196
5–16 Event Reporting: Templates and Codes Device Services Non-Transfer Error Event Sense Data Response The HSZ80 controller Device Services software component reports errors detected while performing non-transfer work related to disk (including CD-ROM and optical memory) device operations via the Device Services Non-Transfer Event Sense Data Response (see Figure 5–9). ■ Instance Codes (byte offset 32-35) are described in Table 5–1, “Instance Codes.
PAGE 197
5–17 off bit 0 7 6 5 Unusd 1 2 4 3 2 1 0 Error Code Unused Unused Sense Key 3-6 Unused 7 Additional Sense Length 8-11 Unused 12 Additional Sense Code (ASC) 13 Additional Sense Code Qualifier (ASCQ) 14 Unused 15–17 Unused 18–31 Reserved 32-35 Instance Code 36 Template 37 Template Flags 38–53 Reserved 54–69 Controller Board Serial Number 70-73 Controller Software Revision Level 74-75 Reserved 76 LUN Status 77–103 Reserved 104 Associated Port 105 Associated T
PAGE 198
5–18 Event Reporting: Templates and Codes Disk Transfer Error Event Sense Data Response The HSZ80 controller Device Services and Value Added Services software components report errors detected while performing work related to disk (including CD-ROM and optical memory) device transfer operations via the Disk Transfer Error Event Sense Data Response (see Figure 5–10). ■ Instance Codes (byte offset 32-35) are described in Table 5–1, “Instance Codes.
PAGE 199
5–19 off bit 7 6 5 0–17 4 3 2 18–19 Reserved 20 Total Number of Errors 21 Total Retry Count 22–25 ASC/ASCQ Stack 26–28 Device Locator 29–31 Reserved 32–35 Instance Code 36 Template 37 Template Flags 38 Reserved 39 Command Opcode 40 Sense Data Qualifier 41–50 Original CDB 51 Host ID 52–53 Reserved 54–69 Controller Board Serial Number 70–73 Controller Software Revision Level 74–75 Reserved 76 LUN Status 77–78 Reserved 79-82 Device Firmware Revision Level 83–
PAGE 200
5–20 Event Reporting: Templates and Codes Instance Codes An Instance Code is a number that uniquely identifies an event being reported. Instance Code Structure Figure 5–11 shows the structure of an instance code. If you understand its structure, you will be able to translate it, bypassing the fault management utility (FMU).
PAGE 201
5–21 NOTE: The offset values enclosed in braces ({}) apply only to the passthrough device reset event sense data response format (see Figure 5–1). The nonbraced offset values apply only to the logical device event sense data response formats shown in the templates that begin on page 5–104. NR Threshold Located at byte offset {8}32, the NR Threshold is the notification/recovery threshold assigned to the event.
PAGE 202
5–22 Event Reporting: Templates and Codes Table 5–1 Instance Codes Instance Code Description Template 01010302 An unrecoverable hardware detected fault occurred. 01 0102030A An unrecoverable software inconsistency was detected or an intentional restart or shutdown of controller operation was requested. 01 01032002 Nonvolatile parameter memory component EDC check failed; content of the component reset to default settings.
PAGE 203
5–23 Table 5–1 Instance Codes (Continued) Instance Code Description Template 02090064 A data compare error was detected during the execution of a compare modified READ or WRITE command. 51 020B2201 Failed read test of a write-back metadata page residing in cache. Dirty write-back cached data exists and cannot be flushed to media. The dirty data is lost. The Memory Address field contains the starting physical address of the CACHEA0 memory.
PAGE 204
5–24 Event Reporting: Templates and Codes Table 5–1 Instance Codes (Continued) Instance Code Description Template 021A0064 Disk Bad Block Replacement attempt completed for a write of controller metadata to a location outside the user data area of the disk. Note that due to the way Bad Block Replacement is performed on SCSI disk drives, information on the actual replacement blocks is not available to the controller and is therefore not included in the event report.
PAGE 205
5–25 Table 5–1 Instance Codes (Continued) Instance Code Description Template 02383A01 The CACHEB0 Memory Controller, which resides on the other cache module failed testing performed by the Cache Diagnostics. This is the mirrored cache Memory Controller. The Memory Address field contains the starting physical address of the CACHEB0 memory.
PAGE 206
5–26 Event Reporting: Templates and Codes Table 5–1 Instance Codes (Continued) Instance Code Description Template 02422464 Cache failover attempt failed because the other cache was illegally configured with DIMMs. Note that in this instance the Memory Address, Byte Count, FX Chip register, Memory Controller register, and Diagnostic register fields are undefined. 14 02492401 The write cache module which is the mirror for the primary cache is unexpectedly not present (missing).
PAGE 207
5–27 Table 5–1 Instance Codes (Continued) Instance Code Description Template 0252000A The last block of data returned contains a forced error. A forced error occurs when a disk block is successfully reassigned, but the data in that block is lost. Re-writing the disk block will clear the forced error condition. The Information field of the Device Sense Data contains the block number of the first block in error.
PAGE 208
5–28 Event Reporting: Templates and Codes Table 5–1 Instance Codes (Continued) Instance Code Description Template 025A000A The command failed because the unit became inoperative prior to command completion. The Information field of the Device Sense Data contains the block number of the first block in error. 51 025B000A The command failed because the unit became unknown to the controller prior to command completion.
PAGE 209
5–29 Table 5–1 Instance Codes (Continued) Instance Code Description Template 02613801 Memory diagnostics performed during controller initialization detected that the DIMM in location 1 failed on the cache module. Note that in this instance the Byte Count field in undefined. 14 02623801 Memory diagnostics performed during controller initialization detected that the DIMM in location 2 failed on the cache module. Note that in this instance the Byte Count field in undefined.
PAGE 210
5–30 Event Reporting: Templates and Codes Table 5–1 Instance Codes (Continued) Instance Code Description Template 02695401 The device specified in the Device Locator field failed to be added to the RAIDset associated with the logical unit. The failed device has been moved to the Failedset. 51 026A5001 The RAIDset associated with the logical unit has gone inoperative. 51 026B0064 The RAIDset associated with the logical unit has transitioned from Normal state to Reconstructing state.
PAGE 211
5–31 Table 5–1 Instance Codes (Continued) Instance Code Description Template 02745A0A The device specified in the Device Locator field had a read error. Attempts to repair the error with data from another mirrorset member failed due to lack of alternate error-free data source. 51 02755601 The device specified in the Device Locator field had a read error. Attempts to repair the error with data from another mirrorset member failed due to a write error on the original device.
PAGE 212
5–32 Event Reporting: Templates and Codes Table 5–1 Instance Codes (Continued) Instance Code Description Template 027C2201 The CACHEB0 and CACHEB1 Memory Controllers failed Cache Diagnostics testing performed on the other cache during a cache failover attempt. The Memory Address field contains the starting physical address of the CACHEB0 memory. 14 027D5B01 The Mirrorset associated with the logical unit has gone inoperative due to a disaster tolerance failsafe locked condition.
PAGE 213
5–33 Table 5–1 Instance Codes (Continued) Instance Code Description Template 02872301 The CACHE backup battery has exceeded the maximum number of deep discharges. Battery capacity may be below specified values. The Memory Address field contains the starting physical address of the CACHEA0 memory. 12 02882301 The CACHE backup battery covering the mirror cache has exceeded the maximum number of deep discharges. Battery capacity may be below specified values.
PAGE 214
5–34 Event Reporting: Templates and Codes Table 5–1 Instance Codes (Continued) Instance Code Description Template 03052002 Device port SCSI chip reported gross error during disk operation. Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined. 41 03062002 Non-SCSI bus parity error during disk operation.
PAGE 215
5–35 Table 5–1 Instance Codes (Continued) Instance Code Description Template 03144002 Drive reported recovered error without transferring all data. 51 03154002 Data returned from drive is invalid. 51 03164002 Request Sense command to drive failed. 51 03170064 Illegal command for pass through mode. 51 03180064 Data transfer request error. 51 03194002 Premature completion of a drive command. 51 031A4002 Command timeout. 51 031B0101 Watchdog timer timeout.
PAGE 216
5–36 Event Reporting: Templates and Codes Table 5–1 Instance Codes (Continued) Instance Code Description 03324002 SCSI bus selection timeout. 03330002 Device power on reset. 03344002 Target assertion of REQ after WAIT DISCONNECT. 03354002 During device initialization a Test Unit Ready command or a Read Capacity command to the device failed. 03364002 During device initialization the device reported a deferred error.
PAGE 217
5–37 Table 5–1 Instance Codes (Continued) Instance Code Description Template 03BE0701 The EMU for the cabinet indicated by the Associated Port field has powered down the cabinet because there are less than four working power supplies present. Note that in this instance the Associated Target, Associated Additional Sense Code, and Associated Additional Sense Code Qualifier fields are undefined.
PAGE 218
5–38 Event Reporting: Templates and Codes Table 5–1 Instance Codes (Continued) Instance Code Description Template 03CC0101 An error code was reported which was unknown to the Fault Management software. Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined. 41 03CD2002 Device port SCSI chip reported gross error during operation to a device which is unknown to the controller.
PAGE 219
5–39 Table 5–1 Instance Codes (Continued) Instance Code Description Template 03D24402 SCSI bus errors during device operation. The device type is unknown to the controller. Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined. 41 03D3450A During device initialization, the device reported the SCSI Sense Key NO SENSE.
PAGE 220
5–40 Event Reporting: Templates and Codes Table 5–1 Instance Codes (Continued) Instance Code Description Template 03D8450A During device initialization, the device reported the SCSI Sense Key ILLEGAL REQUEST. Indicates that there was an illegal parameter in the command descriptor block or in the additional parameters supplied as data for some commands (FORMAT UNIT, SEARCH DATA, etc.).
PAGE 221
5–41 Table 5–1 Instance Codes (Continued) Instance Code Description Template 03DE450A During device initialization, the device reported the SCSI Sense Key ABORTED COMMAND. This indicates the target aborted the command. The initiator may be able to recover by trying the command again. 41 03DF450A During device initialization, the device reported the SCSI Sense Key EQUAL. This indicates a SEARCH DATA command has satisfied an equal comparison.
PAGE 222
5–42 Event Reporting: Templates and Codes Table 5–1 Instance Codes (Continued) Instance Code Description Template 03F10502 The SWAP interrupt from the device port indicated by the Associated Port field can not be cleared. All SWAP interrupts from all ports will be disabled until corrective action is taken. When SWAP interrupts are disabled, both controller front panel button presses and removal/ insertion of devices are not detected by the controller.
PAGE 223
5–43 Table 5–1 Instance Codes (Continued) Instance Code Description Template 03F60402 The controller shelf is reporting a problem. This could mean one or both of the following: 41 If the shelf is using dual power supplies, one power supply has failed. One of the shelf cooling fans has failed. Note that in this instance the Associated Target, Associated Additional Sense Code, and Associated Additional Sense Code Qualifier fields are undefined.
PAGE 224
5–44 Event Reporting: Templates and Codes Table 5–1 Instance Codes (Continued) Instance Code Description Template 03FC0F01 The EMU-detected power supply fault is now fixed. Note that in this instance the Associated Target, Associated Additional Sense Code, and Associated Additional Sense Code Qualifier fields are undefined. 41 03FD0F01 The EMU-detected bad-fan fault is now fixed.
PAGE 225
5–45 Table 5–1 Instance Codes (Continued) Instance Code Description Template 07050064 Failover Control received a Last Gasp message from the other controller. The other controller is expected to restart itself within a given time period. If it does not, it will be held reset with the “Kill” line. 05 07060C01 Failover Control detected that both controllers are acting as SCSI ID 6. Since ids are determined by hardware, it is unknown which controller is the real SCSI ID 6.
PAGE 226
5–46 Event Reporting: Templates and Codes Table 5–1 Instance Codes (Continued) Instance Code Description Template 0C203E02 The Quadrant 0 Memory Controller (CACHEA0) detected a Data Parity error. 14 0C213E02 The Quadrant 1 Memory Controller (CACHEA1) detected a Data Parity error. 14 0C223E02 The Quadrant 2 Memory Controller (CACHEB0) detected a Data Parity error. 14 0C233E02 The Quadrant 3 Memory Controller (CACHEB1) detected a Data Parity error.
PAGE 227
5–47 Table 5–1 Instance Codes (Continued) Instance Code Description Template 82042002 A spurious interrupt was detected during the execution of a Subsystem Built-In Self Test. 13 82052002 An unrecoverable error was detected during execution of the HOST PORT Subsystem Test. The system will not be able to communicate with the host. 13 82062002 An unrecoverable error was detected during execution of the UART/ DUART Subsystem Test. This will cause the console to be unusable.
PAGE 228
5–48 Event Reporting: Templates and Codes Last Failure Codes A Last Failure Code is a number that uniquely-describes an unrecoverable condition. It is found at byte offset 104 to 107 and will only appear in Figure 5–2, “Template 01 - Last Failure Event Sense Data Response Format” on page 5–4, and Figure 5–4, “Template 05 - Failover Event Sense Data Response Format” on page 5–8. Last Failure Code Structure Figure 5–13 shows the structure of a Last Failure Code.
PAGE 229
5–49 NOTE: Do not confuse the Last Failure Code with the Instance Code (see page 5–20). They appear at different byte offsets and convey different information. HW This hardware/software flag is located at byte offset 104, bit 7. If this flag is equal to 1, the unrecoverable condition is due to a hardware-detected fault. If this flag is equal to 0, the unrecoverable condition is due to an inconsistency with the software, or an intentional restart or shutdown of the controller was requested.
PAGE 230
5–50 Event Reporting: Templates and Codes Repair Action The Repair Action found at byte offset 105 indicates the recommended repair action code assigned to the event. This value is used during Symptom-Directed Diagnosis procedures to determine what notification/recovery action should be taken. For more details, see “Recommended Repair Action Codes,” page 5–95.
PAGE 231
5–51 Table 5–3 Last Failure Codes (Continued) Code 01082004 01090105 Description The core diagnostics reported a fault. ■ Last Failure Parameter[0] contains the error code value (same as blinking OCP LEDs error code). ■ Last Failure Parameter[1] contains the address of the fault. ■ Last Failure Parameter[2] contains the actual data value. ■ Last Failure Parameter[3] contains the expected data value. An NMI occurred during EXEC$BUGCHECK processing.
PAGE 232
5–52 Event Reporting: Templates and Codes Table 5–3 Last Failure Codes (Continued) Code Description 010F0110 All structures contained in the System Information Page and the Last Failure entries have been reset to their default settings as the result of certain controller manufacturing configuration activities. If this event is reported at any other time, follow the recommended repair action associated with this Last Failure code.
PAGE 233
5–53 Table 5–3 Last Failure Codes (Continued) Code 01170108 01180105 011B0108 Description The I960 reported a machine fault (parity error) while an NMI was being processed. ■ Last Failure Parameter [0] contains the RESERVED value. ■ Last Failure Parameter [1] contains the access type value. ■ Last Failure Parameter [2] contains the access address value. ■ Last Failure Parameter [3] contains the number of faults value. ■ Last Failure Parameter [4] contains the PC value.
PAGE 234
5–54 Event Reporting: Templates and Codes Table 5–3 Last Failure Codes (Continued) Code 011C0011 Description Controller execution terminated via display of solid fault code in OCP LEDs. Note that upon receipt of this Last Failure in a last gasp message the other controller in a dual controller configuration will inhibit assertion of the KILL line. ■ Last Failure Parameter [0] contains the OCP LED solid fault code value. 011D0100 Relocated zero (for example, C0000000) entered cia call or branch.
PAGE 235
5–55 Table 5–3 Last Failure Codes (Continued) Code 01902086 01910084 01920186 Description The PCI bus on the controller will not allow a mAster to initiate a transfer. Unable to provide further diagnosis of the problem. ■ Last Failure Parameter [0] contains the value of read diagnostic register 0. ■ Last Failure Parameter [1] contains the value of read diagnostic register 1. ■ Last Failure Parameter [2] contains the value of read diagnostic register 2.
PAGE 236
5–56 Event Reporting: Templates and Codes Table 5–3 Last Failure Codes (Continued) Code 01932588 01942088 Description An error has occurred on the CDAL. ■ Last Failure Parameter [0] contains the value of read diagnostic register 0. ■ Last Failure Parameter [1] contains the value of read diagnostic register 1. ■ Last Failure Parameter [2] contains the value of write diagnostic register 0. ■ Last Failure Parameter [3] contains the value of write diagnostic register 1.
PAGE 237
5–57 Table 5–3 Last Failure Codes (Continued) Code 01950188 01960186 01970188 Description An error has occurred that caused the FX to be reset, when not permissible. ■ Last Failure Parameter [0] contains the value of read diagnostic register 0. ■ Last Failure Parameter [1] contains the value of read diagnostic register 1. ■ Last Failure Parameter [2] contains the value of write diagnostic register 0. ■ Last Failure Parameter [3] contains the value of write diagnostic register 1.
PAGE 238
5–58 Event Reporting: Templates and Codes Table 5–3 Last Failure Codes (Continued) Code 01982087 01992088 Description The Ibus encountered a parity error. ■ Last Failure Parameter [0] contains the value of read diagnostic register 0. ■ Last Failure Parameter [1] contains the value of read diagnostic register 1. ■ Last Failure Parameter [2] contains the value of read diagnostic register 2. ■ Last Failure Parameter [3] contains he value of write diagnostic register 0.
PAGE 239
5–59 Table 5–3 Last Failure Codes (Continued) Code Description 020C0100 A call to EXEC$ALLOCATE_MEM_ZEROED failed to return memory when populating the miscellaneous DWD stack. 02100100 A call to EXEC$ALLOCATE_MEM_ZEROED failed to return memory when creating the device services state table. 02170100 Unable to allocate memory for the Free Node Array. 021D0100 Unable to allocate memory for the Free Buffer Array. 021F0100 Unable to allocate memory for WARPs and RMDs.
PAGE 240
5–60 Event Reporting: Templates and Codes Table 5–3 Last Failure Codes (Continued) Code 02530102 02560102 02570102 025A0102 02620102 02690102 027B0102 Description An invalid status was returned from CACHE$LOOKUP_LOCK(). ■ Last Failure Parameter[0] contains the DD address. ■ Last Failure Parameter[1] contains the invalid status. An invalid status was returned from CACHE$LOOKUP_LOCK(). ■ Last Failure Parameter[0] contains the DD address. ■ Last Failure Parameter[1] contains the invalid status.
PAGE 241
5–61 Table 5–3 Last Failure Codes (Continued) Code Description 02800100 Unable to allocate memory for a Failover Control Block. 02840100 Unable to allocate memory for the XNode Array. 02860100 Unable to allocate memory for the Fault Management Event Information Packet used by the Cache Manager in generating error logs to the host. 02880100 Invalid FOC Message in cmfoc_snd_cmd. 028A0100 Invalid return status from DIAG$CACHE_MEMORY_TEST.
PAGE 242
5–62 Event Reporting: Templates and Codes Table 5–3 Last Failure Codes (Continued) Code Description 02A20100 Pubs not one when transportable 02A30100 No available data buffers. If the cache module exists then this is true after testing the whole cache. Otherwise there were no buffers allocated from BUFFER memory on the controller module. 02A40100 A call to EXEC$ALLOCATE_MEM_ZEROED failed to return memory when allocating VAXDs.
PAGE 243
5–63 Table 5–3 Last Failure Codes (Continued) Code 02B00102 Description An invalid status was returned from VA$XFER () in an erase operation. ■ Last Failure Parameter [0] contains the DD address. ■ Last Failure Parameter [1] contains the invalid status. 02B10100 A mirrorset read operation was received and the round robin selection algorithm found no normal members in the mirrorset. Internal inconsistency. 02B20102 An invalid status was returned from CACHE$LOCK_READ during a mirror copy operation.
PAGE 244
5–64 Event Reporting: Templates and Codes Table 5–3 Last Failure Codes (Continued) Code Description 02C90100 Illegal call made to CACHE$PURGE_META when the storageset was not quiesced. 02CA0100 Illegal call made to VA$RAID5_META_READ when another read (of metadata) is already in progress on the same strip. 02CB0000 A restore of the configuration has been done. This cleans up and restarts with the new configuration.
PAGE 245
5–65 Table 5–3 Last Failure Codes (Continued) Code Description 02E11016 While attempting to restore saved configuration information, data for two unrelated controllers was found. The restore code is unable to determine which disk contains the correct information. The Port/Target/LUN information for the two disks is contained in the parameter list. Remove the disk containing the incorrect information, reboot the controller, and issue the SET THIS_CONTROLLER INITIAL_CONFIGURATION command.
PAGE 246
5–66 Event Reporting: Templates and Codes Table 5–3 Last Failure Codes (Continued) Code 02EE0102 02EF0102 Description A CLD is already allocated when it should be free. ■ Last Failure Parameter [0] contains the requesting entity. ■ Last Failure Parameter [1] contains the CLD index. A CLD is free when it should be allocated. ■ Last Failure Parameter [0] contains the requesting entity. ■ Last Failure Parameter [1] contains the CLD index.
PAGE 247
5–67 Table 5–3 Last Failure Codes (Continued) Code 02F54083 02F60103 Description The device saved configuration information selected for the restore process is from an unsupported controller type. Remove the device with the unsupported information and retry the operation. ■ Last Failure Parameter [0] contains the disk port. ■ Last Failure Parameter [1] contains the disk target. ■ Last Failure Parameter [2] contains the disk LUN. An invalid modification to the no_interlock VSI flag was attempted.
PAGE 248
5–68 Event Reporting: Templates and Codes Table 5–3 Last Failure Codes (Continued) Code Description 02F90100 A call to EXEC$ALLOCATE_MEM_ZEROED failed to return memory when allocating structures for read ahead caching. 02FA0100 A read ahead caching data structure (RADD) is inconsistent. 02FB2084 A processor interrupt was generated by the controller’s XOR engine (FX), indicating an unrecoverable error condition. ■ Last Failure Parameter [0] contains the FX Control and Status Register (CSR).
PAGE 249
5–69 Table 5–3 Last Failure Codes (Continued) Code 03040101 Description Invalid SCSI CDROM device opcode in misc command DWD. ■ 03060101 Invalid SCSI device type in PUB. ■ 03070101 Last Failure Parameter [0] contains the SCSI device type. Invalid CDB Group Code detected during create of misc cmd DWD ■ 03080101 Last Failure Parameter [0] contains the SCSI command opcode. Last Failure Parameter [0] contains the SCSI command opcode. Invalid SCSI OPTICAL MEMORY device opcode in misc command DWD.
PAGE 250
5–70 Event Reporting: Templates and Codes Table 5–3 Last Failure Codes (Continued) Code Description 03290100 The required Event Information Packet (EIP) or Device Work Descriptor (DWD) were not supplied to the Device Services error logging code. 032B0100 A Device Work Descriptor (DWD) was supplied with a NULL Physical Unit Block (PUB) pointer. 03320101 An invalid code was passed to the error recovery thread in the error_stat field of the PCB.
PAGE 251
5–71 Table 5–3 Last Failure Codes (Continued) Code 03350188 03370108 Description The TEA (bus fault) signal was asserted into a device port. ■ Last Failure Parameter [0] contains the PCB port_ptr value. ■ Last Failure Parameter [1] contains the PCB copy of the device port TEMP register. ■ Last Failure Parameter [2] contains the PCB copy of the device port DBC register. ■ Last Failure Parameter [3] contains the PCB copy of the device port DNAD register.
PAGE 252
5–72 Event Reporting: Templates and Codes Table 5–3 Last Failure Codes (Continued) Code Description 03380188 A device port’s DSTAT register contains multiple asserted bits, or an invalidily asserted bit, or both. 03390108 033C0101 ■ Last Failure Parameter [0] contains the PCB port_ptr value. ■ Last Failure Parameter [1] contains the PCB copy of the device port TEMP register. ■ Last Failure Parameter [2] contains the PCB copy of the device port DBC register.
PAGE 253
5–73 Table 5–3 Last Failure Codes (Continued) Code 033E0108 033F0108 03410101 Description An attempt was made to restart a device port at the SDP DBD. ■ Last Failure Parameter [0] contains the PCB port_ptr value. ■ Last Failure Parameter [1] contains the PCB copy of the device port TEMP register. ■ Last Failure Parameter [2] contains the PCB copy of the device port DBC register. ■ Last Failure Parameter [3] contains the PCB copy of the device port DNAD register.
PAGE 254
5–74 Event Reporting: Templates and Codes Table 5–3 Last Failure Codes (Continued) Code 03450188 Description A Master Data Parity Error was detected by a port. ■ Last Failure Parameter [0] contains the PCB port_ptr value. ■ Last Failure Parameter [1] contains the PCB copies of the device port DCMD/DBC registers. ■ Last Failure Parameter [2] contains the PCB copy of the device port DNAD register. ■ Last Failure Parameter [3] contains the PCB copy of the device port DSP register.
PAGE 255
5–75 Table 5–3 Last Failure Codes (Continued) Code Description 035B0100 Insufficient DWD resources available for SCSI message passthrough. 03640100 Processing run_switch disabled for LOGDISK associated with the other controller. 03650100 Processing pub unblock for LOGDISK associated with the other controller.
PAGE 256
5–76 Event Reporting: Templates and Codes Table 5–3 Last Failure Codes (Continued) Code 03A08093 03A28193 Description A configuration or hardware error was reported by the EMU. ■ Last Failure Parameter [0] contains the solid OCP pattern which identifies the type of problem encountered. ■ Last Failure Parameter [1] contains the cabinet ID reporting the problem. ■ Last Failure Parameter [2] contains the SCSI Port number where the problem exists (if port-specific).
PAGE 257
5–77 Table 5–3 Last Failure Codes (Continued) Code 04020102 04030102 04040103 Description The requester’s error table index passed to FM$REPORT_EVENT is larger than the maximum allowed for this requester. ■ Last Failure Parameter[0] contains the instance code value. ■ Last Failure Parameter[1] contains the requester error table index value. The USB index supplied in the Event Information Packet (EIP) is larger than the maximum number of USBs.
PAGE 258
5–78 Event Reporting: Templates and Codes Table 5–3 Last Failure Codes (Continued) Code 04110101 Description Unexpected instance code found during fmu_memerr_report processing. ■ 04120101 CLIB$SDD_FAO call failed. ■ 04140103 Last Failure Parameter[0] contains the unexpected instance code value. Last Failure Parameter[0] contains the failure status code value. The template value found in the eip is not supported by the Fault Manager.
PAGE 259
5–79 Table 5–3 Last Failure Codes (Continued) Code Description 07030100 Unable to start the Failover Control Timer before main loop. 07040100 Unable to restart the Failover Control Timer. 07050100 Unable to allocate flush buffer. 07060100 Unable to allocate active receive fcb. 07070100 The other controller killed this, but could not assert the kill line because nindy on or in debug. So it killed this now. 07080000 The other controller crashed, so this one must crash too.
PAGE 260
5–80 Event Reporting: Templates and Codes Table 5–3 Last Failure Codes (Continued) Code Description 080D0100 Could not get enough memory to build a FCB to reply to a request from the other controller. 080E0101 An out-of-range receiver ID was received by the NVFOC communication utility (master send to slave send ACK). Last Failure Parameter[0] contains the bad id value. 080F0101 An out-of-range receiver ID was received by the NVFOC communication utility (received by master).
PAGE 261
5–81 Table 5–3 Last Failure Codes (Continued) Code Description 08200000 Expected restart so the write_instance may recover from a configuration mismatch. 08210100 Unable to allocate memory to setup NVFOC lock/unlock notification routines. 09010100 Unable to acquire memory to initialize the FLM structures. 09640101 Work that was not FLM work was found on the FLM queue. Bad format is detected or the formatted string overflows the output buffer.
PAGE 262
5–82 Event Reporting: Templates and Codes Table 5–3 Last Failure Codes (Continued) Code Description 0A040100 ILF$CACHE_READY DWD overrun. 0A050100 ILF$CACHE_READY DWD underrun. 0A060100 ILF$CACHE_READY found buffer marked for other controller. 0A070100 CACHE$FIND_LOG_BUFFERS returned continuation handle > 0. 0A080100 Not processing a bugcheck. 0A090100 No active DWD. 0A0A0100 Current entry pointer is not properly aligned. 0A0B0100 Next entry pointer is not properly aligned.
PAGE 263
5–83 Table 5–3 Last Failure Codes (Continued) Code 0A1D0102 0A1E0102 Description ILF$LOG_ENTRY page guard check failed. ■ Last Failure Parameter [0] contains the DWD address value ■ Last Failure Parameter [1] contains the buffer address value. ILF$LOG_ENTRY page guard check failed. ■ Last Failure Parameter [0] contains the DWD address value ■ Last Failure Parameter [1] contains the buffer address value. 0A1F0100 ilf_rebind_cache_buffs_to_DWDs found duplicate buffer for current DWD.
PAGE 264
5–84 Event Reporting: Templates and Codes Table 5–3 Last Failure Codes (Continued) Code 0A320102 Description ILF$LOG_ENTRY, page guard check failed. ■ Last Failure Parameter [0] contains the DWD address value. ■ Last Failure Parameter [1] contains the buffer address value. 0A330100 ilf_output_error, message_keeper_array full. 0A340101 ilf_output_error, no memory for message display. 0A350100 DWD failed validation.
PAGE 265
5–85 Table 5–3 Last Failure Codes (Continued) Code 12010103 12020103 12030103 12040103 12050103 12060102 Description Two values found equal. ■ Last Failure Parameter [0] contains the ASSUME instance address. ■ Last Failure Parameter [1] contains first variable value. ■ Last Failure Parameter [2] contains second variable value. First value found bigger or equal. ■ Last Failure Parameter [0] contains the ASSUME instance address. ■ Last Failure Parameter [1] contains first variable value.
PAGE 266
5–86 Event Reporting: Templates and Codes Table 5–3 Last Failure Codes (Continued) Code 12070102 12080102 12090102 Description vsi_ptr->allocated_this not set. ■ Last Failure Parameter [0] contains the ASSUME instance address. ■ Last Failure Parameter [1] contains nv_index value. vsi_ptr->cs_interlocked not set. ■ Last Failure Parameter [0] contains the ASSUME instance address. ■ Last Failure Parameter [1] contains nv_index value. Unhandled switch case.
PAGE 267
5–87 Table 5–3 Last Failure Codes (Continued) Code 200E0101 Description While traversing the structure of a unit, a config_info node was discovered with an unrecognized structure type. ■ 200F0101 A config_info node was discovered with an unrecognized structure type. ■ 20100101 Last Failure Parameter[0] contains the structure type number that was unrecognized. Last Failure Parameter[0] contains the structure type number that was unrecognized.
PAGE 268
5–88 Event Reporting: Templates and Codes Table 5–3 Last Failure Codes (Continued) Code 201F0101 Description CLI$DEALLOCATE_ALL_STRUCT() was called by a process which it does not support. ■ Last Failure Parameter [0] contains pscb address. 20200100 CLI$ALLOCATE_STRUCT() could not obtain memory for a new nvfoc_rw_remote_nvmem structure. 20220020 This controller requested this subsystem to poweroff. 20230000 A restart of both controllers is required when exiting multibus failover.
PAGE 269
5–89 Table 5–3 Last Failure Codes (Continued) Code Description 431A0100 Unable to allocate necessary timer memory in HPP_int(). 43210101 HPP detected unknown error indicated by HPT. ■ Last Failure Parameter [0] contains the error value. 43220100 Unable to obtain Free CSR in HPP(). 43230101 During processing to maintain consistency of the data for Persistent Reserve SCSI commands, an internal inconsistency was detected.
PAGE 270
5–90 Event Reporting: Templates and Codes Table 5–3 Last Failure Codes (Continued) Code Description 44156904 Interrupt from SCSI host port chip indicated interrupt with an unexpected reason (pass value). 44166904 44176904 44186904 ■ Last Failure Parameter [0] contains ISTAT Register. ■ Last Failure Parameter [1] contains DSTAT Register. ■ Last Failure Parameter [2] contains Pass Value (DSPS). ■ Last Failure Parameter [3] contains Chip Register Base.
PAGE 271
5–91 Table 5–3 Last Failure Codes (Continued) Code 44196904 Description Interrupt from SCSI host port chip indicated HTH condition at unexpected script location. ■ Last Failure Parameter [0] contains ISTAT Register. ■ Last Failure Parameter [1] contains DSTAT Register. ■ Last Failure Parameter [2] contains Script PC (DSP). ■ Last Failure Parameter [3] contains Chip Register Base. 441A6900 Unable to locate the IDENTIFY msg in HTB. 441C6900 Encountered an unknown MESSAGE OUT message.
PAGE 272
5–92 Event Reporting: Templates and Codes Table 5–3 Last Failure Codes (Continued) Code Description 64020100 A DD is already in use by a RCVDIAG command—cannot get two RCV_DIAGs without sending the data for the first. 80010100 An HTB was not available to issue an I/O when it should have been. 80030100 DILX tried to release a facility that wasn’t reserved by DILX. 80040100 DILX tried to change the unit state from MAINTENANCE_MODE to NORMAL but was rejected because of insufficient resources.
PAGE 273
5–93 Table 5–3 Last Failure Codes (Continued) Code Description 83020100 An unsupported message type or terminal request was received by the CONFIG virtual terminal code from the CLI. 83030100 Not all alter_device requests from the CONFIG utility completed within the timeout interval. 84010100 An unsupported message type or terminal request was received by the CLONE virtual terminal code from the CLI. 85010100 HSUTIL tried to release a facility that wasn’t reserved by HSUTIL.
PAGE 274
5–94 Event Reporting: Templates and Codes Table 5–3 Last Failure Codes (Continued) Code Description 8A040080 New cache module failed diagnostics. The controller has been reset to clear the error. 8A050080 Could not initialize new cache module. The controller has been reset to clear the error. 8B000186 A single bit error was found by software scrubbing. ■ Last Failure Parameter [0] contains the address of the first single bit ecc error found.
PAGE 275
5–95 Recommended Repair Action Codes Recommended Repair Action Codes are embedded in Instance and Last Failure codes. Refer to “Instance Codes,” page -20, and “Last Failure Codes,” page -48, for a more detailed description of the relationship between these codes. Table 5–4 contains the repair action codes assigned to each significant event in the system. Table 5–4 Recommended Repair Action Codes Code Description 00 No action necessary.
PAGE 276
5–96 Event Reporting: Templates and Codes Table 5–4 Recommended Repair Action Codes (Continued) Code Description 09 Determine power failure cause. 0A Determine which SBB has a failed connector and replace it. 0B The other controller in a dual-redundant configuration has been reset with the “Kill” line by the controller that reported the event.
PAGE 277
5–97 Table 5–4 Recommended Repair Action Codes (Continued) Code Description 22 Replace the indicated cache module or the appropriate memory DIMMs on the indicated cache module. 23 Replace the indicated write cache battery. CAUTION: BATTERY REPLACEMENT MAY CAUSE INJURY. 24 Check for the following invalid write cache configurations: ■ If the wrong write cache module is installed, replace with the matching module or clear the invalid cache error via the CLI. Refer to HSZ80 ACS Version 8.
PAGE 278
5–98 Event Reporting: Templates and Codes Table 5–4 Recommended Repair Action Codes (Continued) Code 3D Description Either the primary cache or the mirrored cache has inconsistent data. Check for the following conditions to determine appropriate means to restore mirrored copies.
PAGE 279
5–99 Table 5–4 Recommended Repair Action Codes (Continued) Code 51 52 Description The mirrorset is inoperative for one or more of the following reasons: ■ The last NORMAL member has malfunctioned. Perform repair actions 55 and 59. ■ The last NORMAL member is missing. Perform repair action 58. ■ The members have been moved around and the consistency checks show mismatched members. Perform repair action 58.
PAGE 280
5–100 Event Reporting: Templates and Codes Table 5–4 Recommended Repair Action Codes (Continued) Code Description 5B The mirrorset is inoperative due to a disaster tolerance failsafe locked condition, as a result of the loss of all local or remote NORMAL/NORMALIZING members while ERROR_MODE=FAILSAFE was enabled. To clear the failsafe locked condition, enter the CLI command SET unit-number ERROR_MODE=NORMAL.
PAGE 281
5–101 Component Identifier Codes Component Identifier Codes are embedded in Instance and Last Failure codes. Refer to “Instance Codes,” page 5-20, and “Last Failure Codes,” page 5-48, for a more detailed description of the relationship between these codes. Table 5–5 lists the component identifier codes.
PAGE 282
5–102 Event Reporting: Templates and Codes Table 5–5 Component Identifier Codes (Continued) Code Description 80 Disk Inline Exercise (DILX) 82 Subsystem Built-In Self Tests (BIST) 83 Device Configuration Utilities (CONFIG) 84 Clone Unit Utility (CLONE) 85 Format and Device Code Load Utility (HSUTIL) 86 Code Load/Code Patch Utility (CLCP) 8A Field Replacement Utility (FRUTIL) 8B Periodic Diagnostics (PDIAG)
PAGE 283
5–103 Event Threshold Codes Table 5–6 lists the classifications for event notification and recovery threshold values. Table 5–6 Event Notification/Recovery Threshold Classifications Threshold Value Classification Description 01 IMMEDIATE 02 HARD Failure of a component that affects controller performance or precludes access to a device connected to the controller is indicated. 0A SOFT An unexpected condition detected by a controller firmware component (e.g.
PAGE 284
5–104 Event Reporting: Templates and Codes ASC/ASCQ Codes Table 5–7 lists HSZ80-specific SCSI ASC and ASCQ codes. These codes are Template-specific and appear at byte offsets 12 and 13. NOTE: Additional codes that are common to all SCSI devices can be found in the SCSI specification. Table 5–7 ASC and ASCQ Codes ASC Code ASCQ Code Description 04 80 Logical unit is disaster tolerant failsafe locked (inoperative).
PAGE 285
5–105 Table 5–7 ASC and ASCQ Codes (Continued) ASC Code ASCQ Code Description A0 01 Nonvolatile parameter memory component event report. A0 02 Backup battery failure event report. A0 03 Subsystem built-in self test failure event report. A0 04 Memory system failure event report. A0 05 Failover event report. A0 07 RAID membership event report. A0 08 Multiple Bus failover event. A0 09 Multiple Bus failback event. A0 0A Disaster Tolerance failsafe error mode can now be enabled.
PAGE 286
5–106 Event Reporting: Templates and Codes Table 5–7 ASC and ASCQ Codes (Continued) ASC Code ASCQ Code Description B0 00 Command timeout. B0 01 Watchdog timer timeout. D0 01 Disconnect timeout. D0 02 Chip command timeout. D0 03 Byte transfer timeout. D1 00 Bus errors. D1 02 Unexpected bus phase. D1 03 Disconnect expected. D1 04 ID Message not sent. D1 05 Synchronous negotiation error. D1 07 Unexpected disconnect. D1 08 Unexpected message.
PAGE 287
6–1 Chapter 6 Connectors, Switches, and LEDs This chapter provides connector, switches, and LED infomation for the HSZ80 Array Controller. Compaq HSG80 Array Controller ACS Version 8.
PAGE 288
6–2 Connectors, Switches, and LEDs Controller Front Panel 4 5 1 1 2 3 4 5 6 6 3 2 CXO6586A Figure 6–1.
PAGE 289
6–3 Operator Control Panel LEDs 1 2 1 2 3 4 5 6 CXO6216B Figure 6–2. Operator Control Panel Switches and LEDs Table 6–2 Operator Control Panel Switches and LEDs Location Description ➀ Controller reset button ➁ Port buttons/LEDs (1 through 6) Compaq HSG80 Array Controller ACS Version 8.
PAGE 290
6–4 Connectors, Switches, and LEDs Power Verification and Addressing Module 1 2 3 CXO5821A Figure 6–3.
PAGE 291
6–5 Environmental Monitoring Unit (EMU) 1 2 3 4 5 6 7 CXO5774A Figure 6–4. EMU Connectors, Switches, and LEDs Table 6–4 EMU Connectors, Switches, and LEDs Location Description ➀ EMU communications connector (labeled IIC) ➁ System fault LED and alarm control switch ➂ Temperature fault LED ➃ Power status LED ➄ Maintenance terminal connector ➅ Blower fault LEDs (8 LEDs) ➆ EMU communications connector (labeled IIC) Compaq HSG80 Array Controller ACS Version 8.
PAGE 292
PAGE 293
7–1 Chapter 7 Controller Specifications This chapter contains physical, electrical, and environmental specifications for the HSZ80 array controller. Compaq HSG80 Array Controller ACS Version 8.
PAGE 294
7–2 Controller Specifications Physical and Electrical Specifications for the Controller Table 7–1 lists the physical and electrical specifications for the controller and cache modules. Table 7–1 Controller Specifications Hardware Length Width HSZ80 Array Controller module 12.5 inches 8.75 inches 23.27 W Write-back Cache, 512 MB 12.5 inches 7.75 inches 2.48 W (Battery charging) Power 8.72 W Current at +5 V Current at +12 V 6.
PAGE 295
7–3 Environmental Specifications The HSZ80 array controller is intended for installation in a Class A computer room environment. The optimum environmental specifications are listed in Table 7–2; the maximum operating environmental specifications are listed in Table 7–3; and the maximum nonoperating environmental specifications are listed in Table 7–4. These are the same as for other Compaq storage devices.
PAGE 296
7–4 Controller Specifications Table 7–3 Maximum Operating Environmental Specifications Condition Specification Temperature +10° to +40°C (+50° to +104°F) Derate 1.8°C for each 1000 m (1.
PAGE 297
A–1 Appendix A Spare Part Number Cross Reference This appendix contains the spare part number cross reference list for the COMPAQ spare part numbers and the DIGITAL spare part numbers. Compaq HSG80 Array Controller ACS Version 8.
PAGE 298
A–2 Spare Part Number Cross Reference System Components Exploded View 1 16 2 15 13 14 3 12 2x 4 11 10 9 2x 5 8 2x 6 7 CXO6742A Figure A–1.
PAGE 299
A–3 Table A–1 The HSZ80 Subsystem Item Description COMPAQ Part Number DIGITAL Part Number 1 BA370 rack-mountable enclosure 401914-001 DS-BA370-MA 2 Cooling fan, blue Cooling fan, gray 400293-001 402602-001 FC-BA35X-MK FC-BA35X-ML 3 Power cable kit, white 401915-001 17-03718-09 4 I/O module, blue I/O module, gray 400294-001 401911-001 FC-BA35X-MN 70-32856-S2 5 SCSI hub, 3 port 401926-001 FC-DWZZH-03 6 SCSI hub, 5 port 401927-001 FC-DWZZH-05 7 SCSI hub, 9 port NOTE: A complete 9-
PAGE 300
A–4 Spare Part Number Cross Reference HSZ80 Array Controller 1 1 2 3 4 5 6 2 3 6 4 5 CXO6703A Figure A–2.
PAGE 301
A–5 Table A–2 HSZ80 Array Controller Item Description COMPAQ Part Number DIGITAL Part Number 1 Program card 103474-001 BG-RFNXA-BA 2 Trilink connector 401948-001 12-44100-01 3 Host bus cable, 1.
PAGE 302
A–6 Spare Part Number Cross Reference Cache Module 1 ~ 2 CXO6570A Figure A–3.
PAGE 303
A–7 Environmental Monitoring Unit (EMU) 1 CXO6604A Figure A–4. EMU Table A–4 EMU Item 1 Description COMPAQ Part Number DIGITAL Part Number EMU communication cable, 4 meter 401949-001 17-03194-04 Compaq HSG80 Array Controller ACS Version 8.
PAGE 304
PAGE 305
GL–1 Glossary This glossary defines terms pertaining to the HSG80 Fibre Channel array controller. It is not a comprehensive glossary of computer terms. 8B/10B A type of byte encoding and decoding to reduce errors in data transmission patented by the IBM Corporation. This process of encoding and decoding data for transmission has been adopted by ANSI. adapter A device that converts the protocol and hardware interface of one bus type into another without changing the function of the bus.
PAGE 306
GL–2 Glossary BBR See bad block replacement. BIST See built-in self-test. bit A single binary digit having a value of either 0 or 1. A bit is the smallest unit of data a computer can process. block Also called a sector. The smallest collection of consecutive bytes addressable on a disk drive. In integrated storage elements, a block contains 512 bytes of data, error codes, flags, and the block’s address header.
PAGE 307
GL–3 CLCP An abbreviation for code-load code-patch utility. CLI See command line interpreter. coax See coaxial cable. coaxial cable A two-conductor wire in which one conductor completely wraps the other with the two separated by insulation. cold swap A method of device replacement that requires the entire subsystem to be turned off before the device can be replaced. See also hot swap and warm swap. command line interpreter The configuration interface to operate the controller software.
PAGE 308
GL–4 Glossary DAEMON Pronounced “demon.” A program usually associated with a UNIX systems that performs a utility (housekeeping or maintenance) function without being requested or even known of by the user. A daemon is a diagnostic and execution monitor. data center cabinet A generic reference to large DIGITAL subsystem cabinets, such as the SW600-series and 800-series cabinets in which StorageWorks components can be mounted.
PAGE 309
GL–5 dual-redundant configuration A controller configuration consisting of two active controllers operating as a single controller. If one controller fails, the other controller assumes control of the failing controller’s devices. dual-simplex A communications protocol that allows simultaneous transmission in both directions in a link, usually with no flow control. DUART Dual universal asynchronous receiver and transmitter.
PAGE 310
GL–6 Glossary failover The process that takes place when one controller in a dual-redundant configuration assumes the workload of a failed companion controller. Failover continues until the failed controller is repaired or replaced. FCC Federal Communications Commission. The federal agency responsible for establishing standards and approving electronic devices within the United States.
PAGE 311
GL–7 full duplex (n) A communications system in which there is a capability for 2-way transmission and acceptance between two sites at the same time. full duplex (adj) Pertaining to a communications method in which data can be transmitted and received at the same time. FWD SCSI A fast, wide, differential SCSI bus with a maximum 16-bit data transfer rate of 20 MB/s. See also SCSI and FD SCSI. giga A prefix indicating a billion (109) units, as in gigabaud or gigabyte.
PAGE 312
GL–8 Glossary hot spots A portion of a disk drive frequently accessed by the host. Because the data being accessed is concentrated in one area, rather than spread across an array of disks providing parallel access, I/O performance is significantly reduced. See also hot disks. hot swap A method of device replacement that allows normal I/O activity on a device’s bus to remain active during device removal and insertion.
PAGE 313
GL–9 IPI Intelligent Peripheral Interface. An ANSI standard for controlling peripheral devices by a host computer. IPI-3 Disk Intelligent Peripheral Interface Level 3 for Disk IPI-3 Tape Intelligent Peripheral Interface Level 3 for Tape JBOD Just a bunch of disks. A term used to describe a group of single-device logical units. kernel The most privileged processor access mode. LBN Logical Block Number. LED Light Emitting Diode.
PAGE 314
GL–10 Glossary logon Also called login. A procedure whereby a participant, either a person or network connection, is identified as being an authorized network participant. LRU Least recently used. A cache term used to describe the block replacement policy for read cache. Mbps Approximately one million (106) bits per second—that is, megabits per second. MBps Approximately one million (106) bytes per second—that is, megabytes per second.
PAGE 315
GL–11 nominal membership The desired number of mirrorset members when the mirrorset is fully populated with active devices. If a member is removed from a mirrorset, the actual number of members may fall below the “nominal” membership. node In data communications, the point at which one or more functional units connect transmission lines. nonredundant controller configuration (1) A single controller configuration. (2) A controller configuration that does not include a second controller.
PAGE 316
GL–12 Glossary parity A method of checking if binary numbers or characters are correct by counting the ONE bits. In odd parity, the total number of ONE bits must be odd; in even parity, the total number of ONE bits must be even. parity bit A binary digit added to a group of bits that checks to see if errors exist in the transmission. parity check A method of detecting errors when data is sent over a communications line. With even parity, the number of ones in a set of binary data should be even.
PAGE 317
GL–13 port (1) In general terms, a logical channel in a communications system. (2) The hardware and software used to connect a host controller to a communications bus, such as a SCSI bus or serial bus. Regarding the controller, the port is (1) the logical route for data in and out of a controller that can contain one or more channels, all of which contain the same type of data. (2) The hardware and software that connects a controller to a SCSI device.
PAGE 318
GL–14 Glossary RAID level 1 A RAID storageset of two or more physical disks that maintains a complete and independent copy of the entire virtual disk’s data. This type of storageset has the advantage of being highly reliable and extremely tolerant of device failure. Raid level 1 storagesets are sometimes referred to as mirrorsets.
PAGE 319
GL–15 redundancy The provision of multiple interchangeable components to perform a single function in order to cope with failures and errors. A RAIDset is considered to be redundant when user data is recorded directly to one member and all of the other members include associated parity information. regeneration (1) The process of calculating missing data from redundant data.
PAGE 320
GL–16 Glossary SCSI bus signal converter Sometimes referred to as an adapter. (1) A device used to interface between the subsystem and a peripheral device unable to be mounted directly into the SBB shelf of the subsystem. (2) a device used to connect a differential SCSI bus to a single-ended SCSI bus. (3) A device used to extend the length of a differential or single-ended SCSI bus. See also I/O module.
PAGE 321
GL–17 single-ended SCSI bus An electrical connection where one wire carries the signal and another wire or shield is connected to electrical ground. Each signal’s logic level is determined by the voltage of a single wire in relation to ground. This is in contrast to a differential connection where the second wire carries an inverted signal. spareset A collection of disk drives made ready by the controller to replace failed members of a storageset. storage array An integrated set of storage devices.
PAGE 322
GL–18 Glossary stripeset See RAID level 0. stripe size The stripe capacity as determined by n–1 times the chunksize, where n is the number of RAIDset members. striping The technique used to divide data into segments, also called chunks. The segments are striped, or distributed, across members of the stripeset. This technique helps to distribute hot spots across the array of physical devices to prevent hot spots and hot disks.
PAGE 323
GL–19 ULP process A function executing within a Fibre Channel node which conforms to the Upper Layer Protocol (ULP) requirements when interacting with other ULP processes. Ultra-SCSI bus A wide, Fast-20 SCSI bus. unit A container made accessible to a host. A unit may be created from a single disk drive or tape drive. A unit may also be created from a more complex container such as a RAIDset. The controller supports a maximum of eight units on each target. See also target and target ID number.
PAGE 324
GL–20 Glossary write-back caching A cache management method used to decrease the subsystem’s response time to write requests by allowing the controller to declare the write operation “complete” as soon as the data reaches its cache memory. The controller performs the slower operation of writing the data to the disk drives at a later time. write-through caching A cache management method used to decrease the subsystem’s response time to a read.
PAGE 325
I–1 Index A AC input module part number, 1–3, A–3 Adding DIMMs, 3–22 Adding cache memory, 3–22 Adding DIMMs, 3–22 Array Controller.
PAGE 326
I–2 Index device_type, 4–39 event codes, 4–39 event threshold codes, 5–103 instance, 4–39, 5–22 to 5–47 last_failure, 4–39 last-failure, 5–50 to 5–93 repair action, 5–95 to 5–100 repair_action, 4–39 structure of events and last-failures, 4–40 translating, 4–39 types of, 4–39 Component codes, 4–39 Component identifier codes, 5–101 CONFIG utility general description, 4–60 Configuration map of devices in subsystem, 4–48 upgrading to dual-redundant controller, 3–17 Configuration utility.
PAGE 327
I–3 Cooling fan part number, 1–3, A–3 D DAEMON tests, 4–2 Data duplicating with the Clone utility, 4–63 Data center cabinet ECB Y cable, 1–6, A–6 Data patterns for DILX write test, 4–58 Deleting patches, 3–8, 3–10 software patches, 3–8, 3–10 Describing event codes, 4–39 Device ports checking status, 4–50 Device statistics utility.
PAGE 328
I–4 Index Dual-redundant controller configuration installing cache module, 2–34 controller, 2–28 controller and its cache module, 2–21 DIMMs, 2–53 removing cache module, 2–31 controller, 2–25 controller and its cache module, 2–17 DIMMs, 2–53 replacing cache module, 2–31 controller, 2–25 controller and its cache module, 2–17 DIMMs, 2–52 ECB, 2–38 ECB with cabinet powered off, 2–40 ECB with cabinet powered on, 2–39 I/O module, 2–46 PCMCIA card, 2–55 replacing modules, 2–16 upgrading from single controller,
PAGE 329
I–5 H no controller termination, 4–33 CLI event reporting, 4–35 spontaneous event log, 4–34 Exercising drives and units, 4–55 F Fault remedy table, 4–6 Fault-tolerance for write-back caching general description, 4–17 nonvolatile memory, 4–17 Field Replacement utility.
PAGE 330
I–6 Index controller and its cache module dual-redundant controller configuration, 2–21 controller, cache module, and ECB, 3–17 DIMMs, 2–53 dual-redundant controller configuration, 2–53 single-controller configuration, 2–53 dual-redundant controller configuration cache module, 2–34 controller, 2–28 controller and its cache module, 2–21 DIMMs, 2–53 mirrorset member, 2–58 patches, 3–8 PCMCIA card, new, 3–3 RAIDset member, 2–58 single-controller configuration cache module, 2–14 controller, 2–11 DIMMs, 2–53 s
PAGE 331
I–7 N Nonvolatile memory fault-tolerance for write-back caching, 4–17 Note, defined, xxi P Part numbers AC input module, 1–3, A–3 BA370 rack-mountable enclosure, 1–3, A–3 cache module, 1–3, A–3 cooling fan, 1–3, A–3 dual-battery ECB, 1–3, A–3 ECB, 1–3, A–3 ECB Y cable BA370 enclosure, 1–6, A–6 data center cabinet, 1–6, A–6 EMU, 1–3, A–3 I/O module, 1–3, A–3 power supply, 1–3, A–3 PVA module, 1–3, A–3 single-battery ECB, 1–3, A–3 Patches deleting, 3–10 installing, 3–8 listing, 3–12 listing, installing, del
PAGE 332
I–8 Index controller and its cache module dual-redundant controller configuration, 2–17 DIMMs, 2–53 dual-redundant controller configuration, 2–53 single-controller configuration, 2–53 dual-redundant controller configuration cache module, 2–31 controller, 2–25 controller and its cache module, 2–17 DIMMs, 2–53 failed mirrorset member, 2–57 failed RAIDset member, 2–57 single-controller configuration cache module, 2–13 controller, 2–9 DIMMs, 2–53 Repair action codes list, 5–95 to 5–100 Repair-action codes log
PAGE 333
I–9 storageset member, 2–57 Required tools, xxii, 2–1, 3–1 Restart_type codes, 4–39 Restarting the subsystem, 2–7 Revision history, xxiv Running controller self-test, 4–2 DAEMON tests, 4–2 DILX, 4–55 FMU, 4–37 VTDPY, 4–43 S SCSI command operations, 4–39 Self-test, 4–2 Setting display characteristics for FMU, 4–40 Shutting down the subsystem, 2–5 disabling the ECBs, 2–5 enabling the ECBs, 2–5 Significant event reporting, 4–24 Single-battery ECB part number, 1–3, A–3 Single-controller configuration installi
PAGE 334
I–10 Index Storagesets adding devices with the CONFIG utility, 4–60 duplicating data with the Clone utility, 4–63 generating a new volume serial number with the CHVSN utility, 4–64 renaming the volume serial number with the CHVSN utility, 4–64 Structure of event codes, 4–40 Subsystem restarting, 2–7 shutting down, 2–5 upgrading, 3–1 Symptoms, 4–6 renaming the volume serial number with the CHVSN utility, 4–64 replacing a failed controller with FRUTIL, 4–63 replacing cache modules with FRUTIL, 4–63 replaci
PAGE 335
I–11 using CLCP, 3–8 deleting patches, 3–10 deleting software patches, 3–10 installing patches, 3–8 installing software patches, 3–8 listing patches, 3–12 listing software patches, 3–12 Utilities and exercisers CHVSN utility, 4–64 CLCP utility, 4–62 Clone utility, 4–63 CONFIG utility, 4–60 DSTAT, 4–64 FRUTIL, 4–63 HSUTIL, 4–61 V Verbose logging, 4–42 Virtual terminal display.
PAGE 336