ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 First Edition (January 2001) Part Number 221540-001 Compaq Computer Corporation Compaq Confidential – Need to Know Required Writer: Rachel Williams Project: Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Comments: Part Number: 221540-001 File Name: a-frnt.
Notice © 2001 Compaq Computer Corporation Compaq, the Compaq logo, NonStop, ProLiant, SmartStart, Compaq Insight Manager, ServerNet, and ROMPaq Registered in U.S. Patent and Trademark Office. Microsoft, MS-DOS, Windows, and Windows NT are trademarks of Microsoft Corporation in the United States and other countries. Intel and Pentium are trademarks of Intel Corporation in the United States and other countries. UNIX is a trademark of The Open Group in the United States and other countries.
Contents About This Guide Text Conventions.......................................................................................................vii Symbols in Text....................................................................................................... viii Symbols on Equipment............................................................................................ viii Getting Help ...........................................................................................................
iv Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Setting Up Cluster Hardware continued Setting Up the External Storage Hardware .............................................................. 2-7 Cabling the Components.......................................................................................... 2-8 Using Labeling Standards .................................................................................
Contents Managing Clusters continued Compaq ProLiant Cluster Management Software for SCO UnixWare 7 NonStop Clusters ..................................................................................................... 4-7 Compaq Insight Manager Support.................................................................... 4-7 Compaq Insight Manager XE Support.............................................................. 4-8 NonStop Clusters Verification Utility ..............................................
About This Guide Use the Compaq ProLiant Clusters for the SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 as step-by-step instructions for installation and as a reference for cluster operation and troubleshooting. Text Conventions The following conventions distinguish elements of text: Keys, Buttons Keys and buttons appear in boldface. A plus sign (+) between two keys indicates that they should be pressed simultaneously.
viii Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Symbols in Text These symbols may be found in the text of this guide. They have the following meanings. WARNING: Text set off in this manner indicates that failure to follow directions in the warning can result in bodily harm or loss of life. CAUTION: Text set off in this manner indicates that failure to follow directions could result in damage to equipment or loss of information.
About This Guide This symbol, on an RJ-45 receptacle, indicates a network interface connection. WARNING: To reduce the risk of electric shock, fire, or damage to the equipment, do not plug telephone or telecommunications connectors into this receptacle. This symbol indicates the presence of a hot surface or hot component. If this surface is contacted, the potential for injury exists. WARNING: To reduce the risk of injury from a hot component, allow the surface to cool before touching.
x Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Getting Help If you have a problem and have exhausted the information in this guide, you can obtain further information and other help in the following locations. Compaq Technical Support In North America, call the Compaq Technical Support Phone Center at 1-800-OK-COMPAQ. This service is available 24 hours a day, 7 days a week. For continuous quality improvement, calls may be recorded or monitored.
About This Guide Compaq Authorized Reseller For the name of your nearest Compaq authorized reseller: ■ In the United States, call 1-800-345-1518. ■ In Canada, call 1-800-263-5868. ■ Elsewhere, see the Compaq website for locations and telephone numbers. Compaq Confidential – Need to Know Required Writer: Rachel Williams Project: Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Comments: Part Number: 221540-001 File Name: a-frnt.
Chapter 1 Clustering Overview A Compaq ProLiant™ Cluster for UnixWare 7 is a collection of servers, storage, and software that allows independent storage and servers to act as a single system. The cluster presents a single-system image to clients. It also protects against hardware, operating system, middleware, and application failures and provides configuration options for load balancing.
1-2 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 The Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Cluster Kit (U/300 kit) for the ProLiant ML370 server supports specific hardware components, enabling the cluster software to be installed in about an hour.
Clustering Overview ■ For clusters using ServerNet™ I interconnect: G One ServerNet I PCI adapter installed into slot 1 of each server G Two ServerNet I cables Storage Components The U/300 kit for the ProLiant ML370 server supports the following storage hardware components: ■ One RA4100 storage subsystem, including one Compaq StorageWorks RAID Array 4000 (RA4000) primary array controller ■ One RA4000 redundant array controller ■ Two GBIC-SWs, one in each controller ■ Two 9.
1-4 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Cluster Integrity Serial Cable The Cluster Integrity (CI) serial cable listed with the server components is required for the U/300 Quick Install cluster for the ProLiant ML370 server. This cable prevents the condition in which more than one node in a cluster acts as the root node and operates as the root node.
Clustering Overview An Ethernet cluster interconnect uses the embedded NIC in each server connected by one Ethernet crossover cable as shown in Figure 1-2. Node 1 Node 2 Ethernet Crossover Cable CI Serial Cable RA4100 Figure 1-2. Example of hardware components of the Ethernet cluster interconnect configuration LAN Connection Clusters using Ethernet interconnect require an NC3123 NIC installed into slot 1 of each node before cluster software installation so that the cluster can access a public network.
1-6 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Software Components Software components of the U/300 kit for the ProLiant ML370 server include: ■ SCO UnixWare Release 7.1.1 Compact Media Kit ■ SCO UnixWare 7 NonStop Clusters Media Kit Version 7.1.
Clustering Overview NOTE: SCO UnixWare 7 (with Mirroring Option or Online Data Manager) and UnixWare 7 NonStop Clusters software licenses must be purchased through your SCO reseller or distributor. To locate a convenient SCO reseller or distributor to purchase licenses, see the SCO website at http://www.sco.com Quick Install CDs for the ProLiant ML370 Server The Quick Install CDs for the ProLiant ML370 server provide rapid and simplified cluster installation.
1-8 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 ■ Compaq Insight Manager™ Agents These agents provide system information to the Compaq Insight Manager, which is available on the Management CD that comes with the ProLiant servers. Compaq ServerNet Verification Utility (SVU) The Compaq ServerNet Verification Utility (SVU) verifies proper installation and cabling of the Compaq ServerNet I interconnect before a UnixWare software installation.
Clustering Overview Compaq Management CD The Compaq Management CD shipped with ProLiant servers contains software for managing Compaq clusters. The Compaq Insight Manager is included on the CD along with Compaq Management Agents and Tools for Servers for SCO UnixWare 7 NonStop Cluster. The Quick Install process automatically installs the agents and tools. ■ Compaq Insight Manager Compaq Insight Manager is an easy-to-use Microsoft Win32 software utility for collecting server and cluster information.
1-10 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Overview of Cluster Assembly and Software Installation Steps Use the following general steps to set up your cluster hardware, initialize the hardware, and install the software. The specific procedures are found in the sections noted in these steps: 1. Set up the cluster hardware.
Clustering Overview 4. Upgrade controller firmware. Firmware provides an interface between hardware and software. It is important to use the latest firmware for full hardware functionality. Upgrading controller firmware is performed using a diskette created as part of server configuration. Refer to “Updating Controller Firmware” in Chapter 3. 5. Verify ServerNet I connections.
1-12 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Other References For more information about the RA4100 storage subsystem or RA4000 redundant array controller, refer to the following guides, either as included with your hardware or as found at the Compaq Support website at http://www.compaq.
Clustering Overview ■ Use the following URL to access SCOhelp remotely when the cluster is attached to the public network: http:// clustername:457 Substitute the name of your cluster or its CVIP address for clustername. The browser displays the main SCOhelp list of topics. ■ Use the man command to access manual pages from any command line by entering man and the name of the command, file, or routine about which you want information.
Chapter 2 Setting Up Cluster Hardware Setting up a cluster includes setting up, cabling, and verifying hardware components. Use the following sections to set up the Compaq ProLiant Clusters for SCO UnixWare 7 U/300 for the Compaq ProLiant ML370 Quick Install Cluster: ■ Assembling the Rack ■ Setting Up the Cluster Nodes ■ Setting Up the External Storage Hardware ■ Cabling the Components For specific information about individual components, see the documentation that comes with the component.
2-2 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Assembling the Rack In clusters that use racks, rack assembly requires careful attention to avoid problems. Evaluate the site where the cluster is to be installed by checking the path and setup area.
Setting Up Cluster Hardware Stacking Components Keep in mind the following considerations while stacking components in a rack: ■ Put the UPSs in the bottom of the rack. ■ Assemble other components into the rack from the bottom up. ■ Put the heaviest equipment per U of height in the bottom of the rack whenever possible. ■ Install non-flat-panel monitors toward the top of the rack. ■ Install components that require better cooling capacity toward the top of the rack.
2-4 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Transporting Racks Before transporting a filled rack, read the documentation that comes with the rack to determine the safety measures to take for successful transportation. Never transport a rack without first reviewing the documentation. Develop standard procedures for securing rack equipment depending on the rack and its components.
Setting Up Cluster Hardware Setting Up the Cluster Nodes Setting up the cluster nodes includes: ■ Installing the 64-bit Fibre Channel Host Bus Adapter (HBA) into slot 3 of each node and Gigabit Interface Converters Shortwave (GBIC-SW) into each adapter ■ Installing one 9.
2-6 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 3. Do not install or update drivers. The Quick Install procedures install the Fibre Channel drivers. Installing Internal Disk Drives One 9.1-GB disk drive is required per node. The Quick Install automatically configures each internal drive with a 9.1-GB partition, even if the disk drive is larger than 9.1-GB.
Setting Up Cluster Hardware Setting Up the External Storage Hardware IMPORTANT: The RA4100 is shipped with a single RAID controller. Each RA4100 array used in Compaq ProLiant Clusters for SCO UnixWare 7 requires an additional, redundant controller. NOTE: The Quick Install automatically configures an RA4100 drive with a RAID 1 9.1-GB UnixWare partition, even if the disk drive is larger than 9.1-GB. This partition cannot be modified and other UnixWare disk drive partitions cannot be added to this disk drive.
2-8 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 3. In each array, install one redundant controller into the lower slot (rack-mount) or into the left slot (tower as viewed from the back) according to the following steps: a. Disconnect the power from the storage subsystem. b. Remove the cover from the second controller slot. c. Rotate the board 180 degrees from the position of the top controller. d. Insert the RA4000 redundant controller. e.
Setting Up Cluster Hardware Cabling the ServerNet I Interconnect ServerNet I adapters include X and Y connections for redundancy. Figure 2-1 shows the ServerNet I adapter connections. Port X Connector Port Y Connector PCI Bus Connector Figure 2-1. ServerNet I PCI adapter connections IMPORTANT: Cable X and Y to their corresponding counterparts. Do not cable X connections to Y connections.
2-10 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 The ServerNet I cables directly connect the ServerNet I adapter in node 1 to the ServerNet I adapter in node 2, as shown in Figure 2-2. Node 1 X Dedicated ServerNet I Cables Y CI Serial Cable Node 2 To Public Network Figure 2-2. Example of cabling the cluster interconnect of a cluster that uses ServerNet I NOTE: Cabling for the external storage is intentionally not shown.
Setting Up Cluster Hardware Use the cabling suggestions illustrated in Figure 2-3 to label the ServerNet I cables. Node ServerNet I Number Switch Port Number 1 0 Cable Tie Color Pink X ServerNet I cables are identified with White cable ties. X/Y Fabric Identifier Node Identifier 2 1 Orange Red ties are used only during shipment and are to be removed during onsite installation. Figure 2-3. ServerNet I cable labeling suggestion To cable the ServerNet I interconnect, follow these steps: 1.
2-12 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Cabling the Public LAN Connection For interconnects using ServerNet I, connect the public LAN Ethernet cable to the embedded NIC of the servers. See Figure 2-2 earlier in this chapter. For interconnects using Ethernet, connect the public LAN Ethernet cable to the NC3123 NIC into slot 1 of the servers. See Figure 2-4.
Setting Up Cluster Hardware Cabling the CI Serial Cable IMPORTANT: The CI serial cable is required. To cable the CI serial cable, connect one end of the CI serial cable to serial port connector B in node 1. Connect the other end of the CI serial cable to serial port connector B in node 2. Figure 2-2 illustrates the proper cabling for clusters that use ServerNet I interconnect. Figure 2-4 illustrates the proper cabling for clusters that use Ethernet interconnect.
2-14 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Fibre Channel Cable Precautions Keep the following precautions in mind when installing, handling, moving, connecting, and disconnecting Fibre Channel cables: ■ Affix cable labels carefully, without over-tightening, to avoid breaking the glass fibers within the cables. ■ Do not bend the Fibre Channel cable into an arc tighter than the minimum allowable bend radius specified by the cable manufacturer.
Setting Up Cluster Hardware Cabling the Keyboard, Monitor, and Mouse To cable the keyboard, monitor, and mouse, refer to the documentation that comes with these devices. UPS Power Management Cabling Compaq ProLiant Clusters for SCO UnixWare 7 support serial data connections from UPS units to ProLiant server nodes in the cluster. This feature provides the cluster with soft shutdown capability when an AC power outage lasts until the UPS batteries approach the end of their holdup period.
Chapter 3 Installing Cluster Software Using the Compaq ProLiant Clusters for the SCO UnixWare 7 ML370 Quick Install CDs for the Compaq ProLiant ML370 server to install the SCO UnixWare 7 NonStop Clusters software on a ProLiant ML370 Cluster includes several tasks.
3-2 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Understanding Preinstallation Tasks and Considerations Before you begin the software installation, assemble the hardware for the cluster, fill out the Quick Install planning worksheets in Appendix B of this guide, and have four formatted diskettes on hand. Read through this chapter to become familiar with the installation procedures as you fill out the worksheets.
Installing Cluster Software Configuring the Servers with SmartStart Before cluster installation on each node, you must erase any existing configuration and configure each server using the SmartStart CD that comes with the ProLiant ML370 server. You must also set two hardware configuration items on each server. Start with the server that you plan to use as node 1.
3-4 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 5. If you used the Server Profile Diskette, remove it. Power down the RA4100 if you are erasing the configuration on node 1. When prompted, power down, and then power up only the server. IMPORTANT: Do not turn the RA4100 back on at this time. Continue with the following procedure for server configuration. Begin with step 2 because you have erased a previous configuration.
Installing Cluster Software 14. Verify that the following items are disabled: G Software Error Recovery G Standby Recovery Server G UPS Shutdown Use the arrow keys to select the options and the Enter key to modify them as necessary. 15. Page down to Embedded-Compaq Integrated Dual Channel Wide Ultra2 SCSI Controller (Port2). G Select Controller Order, and then press Enter. G Select First and press F10. The Configuration Changes screen displays. G Press Enter to accept the changes. 16.
3-6 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 c. Page down to Options ROMPaq, select it, and then click Next. Although the onscreen instructions indicate that you need 10 diskettes, this procedure creates only a single diskette. A screen for creating the first diskette displays. d. Click Skip. The Firmware Upgrade diskette for the RA4000 Controller displays. e. Insert the formatted diskette into the disk drive, and then click OK. f.
Installing Cluster Software Updating Controller Firmware Controller firmware must be updated on both nodes. Use the following procedure to upgrade the controller firmware: 1. Turn on the RA4100 and wait about 90 seconds for the RA4000 controllers to complete their POSTs. 2. Insert the firmware upgrade diskette into the drive. (You made this diskette in the preceding procedure.) 3. Boot the node from the diskette, and then follow the prompts on the screen until the firmware is updated. 4.
3-8 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 3. From the options presented to you, select the following: G Select your particular server from the list presented to you. G Select the appropriate model or All Models. G Select SCO UnixWare 7 from the list of operating systems. 4. Select the Softpaq for ServerNet Verification Utilities. At the download page, follow the directions for downloading the Softpaq and creating diskettes.
Installing Cluster Software Verifying Node-to-Node Communication Node-to-node communication tests include a link test for the cables and a loopback test for the adapters. Use the following steps to verify node-to-node communication on a directly connected ServerNet I two-node cluster: 1. Insert a ServerNet Utility Disk into node 1 and node 2, and then reboot the nodes. Wait for the DOS prompt on the nodes. 2. Type spaf 1 2 at the DOS prompt on node 1, and then press Enter. A title screen displays. 3.
3-10 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Installing the Cluster Using Quick Install Before beginning the software installation, be sure to have the Quick Install planning worksheets on hand and the following items available: ■ Cluster name and Cluster Virtual IP (CVIP) address ■ Node 1 hostname and IP address for the public network ■ Node 2 hostname and IP address for the public network ■ Netmask for the public network ■ For clusters
Installing Cluster Software Installing Node 1 Before beginning the installation, select the set of Quick Install CDs for your cluster configuration. Choose the CDs for either the ServerNet I cluster interconnect or Ethernet cluster interconnect. NOTE: To save time, you can install both nodes together. Be sure node 1 has rebooted before rebooting node 2. Insert the CDs into the servers, power up the servers, and follow the procedures for each node at the same time.
3-12 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 b. Date, Time, and Time Zone Modify the current date, time, and time zone as necessary. Only U.S. time zones are available during Quick Install. Time zone information can be changed after Quick Install by using the UnixWare SCOadmin system administration tools. For more information, see the “Understanding Preinstallation Tasks and Considerations” section in this chapter. c.
Installing Cluster Software 6. Enter the node 1 UnixWare license, the node 2 UnixWare license, and the NonStop Clusters license. To complete this step, you must have either UnixWare licenses that include the mirroring license, or an add-on license for either the ODM or mirroring. After you exit the license manager, the node continues booting. NOTE: Node 2 cannot join the cluster until licensing information has been entered on node 1.
3-14 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 a. Accept the default addresses and netmask or enter the Ethernet interconnect address for node 1, the Ethernet interconnect address for node 2, and the netmask. IMPORTANT: If you supply information here, this information must match the information that you supplied for the first node. See step 4e of "Installing Node 1" earlier in this chapter.
Installing Cluster Software Additional Cluster Setup Tasks After you have installed your cluster, you can configure the nameserver for the domain name of the cluster or modify the Quick Install default settings outline in Table 3-1. NOTE: Information about configuring nameservers using the SCOadmin system management tool can be found in the SCOhelp online documentation set. You can use the SCOhelp search tool to locate your information.
3-16 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Viewing UnixWare and NonStop Clusters Documentation After the cluster is installed, you can view SCO UnixWare 7 NonStop Clusters documentation. The main documentation system is called SCOhelp and contains information that can answer many administrative questions. Additionally, you can access manual pages using the man(1M) command.
Chapter 4 Managing Clusters Compaq and SCO both provide a variety of software to simplify the management of ProLiant Clusters for SCO UnixWare 7. SCO cluster management software includes: ■ Clusterized SCOadmin ■ Event Processor Subsystem ■ SCO UnixWare 7 NonStop Clusters Management Suite ■ Clusterized and cluster-specific command line utilities Compaq provides the management capabilities customized for use with ProLiant Clusters for SCO UnixWare 7.
4-2 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 SCO UnixWare 7 NonStop Clusters Management Software The single-system image of the Compaq ProLiant Cluster makes managing a cluster similar to managing a single-node, noncluster UnixWare 7 system. The standard SCO documentation is useful for performing the management tasks.
Managing Clusters Clusterized SCOadmin SCOadmin is the SCO UnixWare 7 system administration tool. You can access this tool from the UnixWare desktop by clicking the tree icon in the toolbar. You can also access the tool by entering scoadmin on a command line. The SCOadmin software provided with SCO UnixWare 7 NonStop Clusters has been clusterized for use in a NonStop Clusters environment. Help information is available from each SCOadmin screen.
4-4 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Event Processing Subsystem The Event Processing Subsystem (EPS) is installed during cluster installation. Use the EPS to configure actions and notifications based on system messages (syslogd). See the SCO UnixWare 7 NonStop Clusters System Administrator’s Guide for more information.
Managing Clusters Keepalive Manager The SCO UnixWare 7 NonStop Clusters Keepalive Manager provides a graphical user interface to monitor the status of applications currently being managed by the Keepalive subsystem. Applications are placed under Keepalive control through use of the spawndaemon command. See the SCO UnixWare 7 NonStop Clusters System Administrator’s Guide in the NonStop Clusters Documentation topic in SCOhelp for more information on the Keepalive subsystem.
4-6 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 ■ clusternode_shutdown—Shuts down a specified node ■ nodedown—Halts the specified node without processing ■ dbms_guard—Runs the Data Base Management System guard ■ ncms—Runs the NonStop Cluster Management Subsystem ■ ctsm—Runs the Cluster Time Sync Monitor The SCO UnixWare commands that are clusterized in SCO UnixWare 7 NonStop Clusters include: ■ netstat, inetd, netcfg, rpcbind ■ fuser, df
Managing Clusters Compaq ProLiant Cluster Management Software for SCO UnixWare 7 NonStop Clusters Compaq provides the following cluster management capabilities customized for use with Compaq ProLiant Cluster for SCO UnixWare 7 NonStop Clusters. These capabilities are available on the Compaq Management CD shipped with ProLiant servers.
4-8 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Clusterized Compaq Management Agents Compaq Management Agents running on a SCO UnixWare 7 NonStop Clusters system support the same client-server interface as a single-server SCO UnixWare 7 system. The client-server interface for Compaq Insight Manager is SNMP-based, which allows ProLiant servers and clusters to be managed by other network management client software.
Managing Clusters The Quick Install procedure automatically installs the support needed for Compaq Insight Manager XE. On the Management CD, the package that provides this support is nscccm and is part of the Compaq Management Agents and Tools for Servers for SCO UnixWare 7 NonStop Clusters portion of the CD. For additional information, refer to the Compaq Insight Manager XE User Guide included on the Management CD.
4-10 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Configuring SCO UnixWare 7 NonStop Clusters for UPS-Initiated Shutdown The UPS-initiated shutdown is configured by modifying the OS_SHUTDOWN, UPS_LOG_FILE, and UPS_SERIAL_PORT parameters within the /opt/compaq/etc/nscupsd.cfg configuration file. The OS_SHUTDOWN parameter specifies the battery backup power remaining when a cluster-wide shutdown is initiated.
Managing Clusters Two-Node Cluster with a Single Power Supply in Each Node When using a two-node cluster with two UPSs, as shown in Figure 4-1, configure the UPSs so that the cluster shuts down only if both UPSs are low on power. The loss of a single physical UPS results in the loss of one of the nodes but not the loss of the cluster. In this configuration, both UPSs are combined into a single logical UPS, which results in a UPS_SERIAL_PORT configuration of: UPS_SERIAL_PORT=/dev/tty00.1:/dev/tty00.
Chapter 5 Troubleshooting Carefully follow the detailed instructions provided in this guide to avoid unnecessary problems.
5-2 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Installation Problems This section addresses problems relating to installation of SCO UnixWare 7 or SCO UnixWare 7 NonStop Clusters. Table 5-1 Solving Installation Problems Problem Possible Cause Action Server unit does not power up Power cord or power source Check all power cords to ensure that they are fully inserted into the power supply plug and the outlet.
Troubleshooting Table 5-1 Solving Installation Problems continued Problem Possible Cause Action Error messages regarding the Cluster Integrity (CI) serial cable display The CI serial cable is not properly installed Install the CI serial cable between node 1 and node 2 using the serial port connector B on each node.
5-4 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Quick Install Error Messages This section addresses errors relating to Quick Install installation. Table 5-2 Quick Install Error Messages Error Message Possible Cause Action No disks found No internal disk drive Add a 9.1-GB or larger disk drive and configure the system with the SmartStart. See Chapter 2, “Setting Up Cluster Hardware” and Chapter 3, “Installing Cluster Software” of this guide.
Troubleshooting Node-to-Node Communication Problems This section addresses problems relating to node-to-node communication. Table 5-3 Solving Node-to-Node Communication Problems Problem Possible Cause Action New node does not join the cluster Ethernet crossover cable is not correctly cabled or is defective Verify that the Ethernet crossover cable is connected as described in Chapter 2 of this guide. Embedded NIC is not correctly functioning Verify that the embedded NIC is correctly configured.
5-6 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Table 5-3 Solving Node-to-Node Communication Problems continued Problem Possible Cause Action Existing node does not rejoin the cluster Node hardware failure Disconnect the node from the cluster. Diagnose and repair hardware failures as a stand-alone ProLiant server.
Troubleshooting Table 5-3 Solving Node-to-Node Communication Problems continued Problem Possible Cause Action Alternating root node panics (RA4100 system) RA4100 storage subsystems or hubs are not powered up Apply power to the hubs and storage subsystems. Ethernet connection failed (and the CI serial cable is not used) Power down the cluster. Check the Ethernet crossover cable to determine that the cable is properly connected, or is not crimped or compromised in any way.
5-8 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Table 5-3 Solving Node-to-Node Communication Problems continued Problem Possible Cause Action Alternating root node panics ServerNet I cross-cabled in two-node cluster (and the CI serial cable is not used) Power down the cluster. Verify that ServerNet I is cabled between cluster nodes (X to X and Y to Y) as described in Chapter 2 of this guide. Correct the cabling and boot the cluster.
Troubleshooting Table 5-3 Solving Node-to-Node Communication Problems continued Problem Possible Cause Action Bad packets or ServerNet I barrier errors reported SPA is defective Cluster Membership Service (CLMS) master (the active root node) is unable to communicate with a node during startup or normal operation. If a node does not join the cluster, verify that the SPA is functioning on that node using the SVU as described in Chapter 3 of this guide.
5-10 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Shared Storage Problems This section addresses problems that can be encountered in clusters using the Compaq StorageWorks RAID Array 4100 storage system. This section does not address RA4100 storage system problems specific to the storage system itself. For those issues, see the user guide for the RA4100 and the Fibre Channel troubleshooting guide.
Troubleshooting Table 5-4 Solving Shared Storage Problems continued Problem Possible Cause Action “Unable to initialize FC loop” error message displays Failed or disconnected FC-AL (hub, adapter, or controller) Diagnose and isolate the problem using the information contained in the user guide for the RA4100 and the Fibre Channel troubleshooting guide. Replace any defective component.
5-12 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Client-to-Cluster Connectivity Problems This section addresses problems relating to client-to-cluster connectivity. Table 5-5 Solving Client-to-Cluster Connectivity Problems Problem Possible Cause Action Clients cannot communicate with a node (or nodes) over Ethernet Improper name resolution Verify that the /etc/resolv.conf file within the cluster indicates the correct domain name servers.
Troubleshooting Table 5-5 Solving Client-to-Cluster Connectivity Problems continued Problem Possible Cause Action CVIP is not accessible after a node failure Cluster virtual interface has no available public network interfaces on the same subnet Configure the cluster so that at least two public network interface NIC boards on two different nodes have IP addresses on the same subnet as the CVIP address.
5-14 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Cluster Resource Problems This section addresses problems relating to cluster resources. Table 5-6 Solving Cluster Resource Problems Problem Possible Cause Action Device is not seen on all nodes in a cluster Mismatched kernels Ensure that all nodes are in the cluster, and then reboot node 2.
Troubleshooting ServerNet I Messages Use this section to interpret and respond to the following types of messages: ■ ServerNet I SAN Error Messages ■ ServerNet I Notice Messages ■ ServerNet I Warning Messages ■ ServerNet I Panic Messages ■ ServerNet I Continuation and Informative Messages For information about ServerNet, see the NonStop Clusters for the SCO UnixWare 7 System Administrator’s Guide located in the SCOhelp online documentation set.
5-16 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Table 5-7 lists the text strings for severity, explains what the text strings mean, and references the tables containing the message details.
Troubleshooting ServerNet I Notice Messages This section addresses ServerNet I Notice Messages. Table 5-9 ServerNet I Notice Messages Messages Description User Action Barrier failed on path:n snetID:0xF0nnn curpath:n These messages display when a new node attempts to join a cluster. The message indicates whether the new node is able to communicate with the target node over the given path (X/Y). If a path is cabled, a success message is expected.
5-18 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Table 5-9 ServerNet I Notice Messages continued Messages Description User Action Link exception condition on path n has been resolved. Re-enabling path n Indicates that a link exception condition on a path is resolved and that link exception detection and processing is re-enabled for that path. The path becomes available for ServerNet I communications within the next minute.
Troubleshooting ServerNet I Warning Messages The warning messages are listed in Table 5-10. Messages are listed in alphabetical order except where a series of messages associated with a single-fault condition are grouped together. These groups are alphabetized under the first message in the series. If you cannot find a particular message, look toward the end of the table where multiple messages having the same description are grouped together and are not in alphabetical order.
5-20 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Table 5-10 ServerNet I Warning Messages continued Messages Description User Action (0xF0nnn) Multiple link exceptions detected on path n This series of messages indicates that a burst of link exceptions was detected on a ServerNet I path. Link exception reporting must be enabled (see spam –l on command) for these messages to be displayed. Check the cabling at the local node on indicated the path.
Troubleshooting Table 5-10 ServerNet I Warning Messages continued Messages Description User Action 0xF0nnn: rcvd spurious packet acknowledge, src=0xnnnnnnnn Indicates that an unexpected packet acknowledgment arrived. Usually this message can be linked with a [SNET] timeout message. The acknowledgment from the packet that was timed out arrived late. None. Watch for additional [SNET] timeout messages.
5-22 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Table 5-10 ServerNet I Warning Messages continued Messages Description User Action (0xF0nnn) exception queue error These messages indicate hardware error conditions were detected during interrupt processing. Queue overruns and transmitter/receiver overflows indicate a potential loss of a response due to buffer space exhaustion. None.
Troubleshooting Table 5-11 ServerNet I Panic Messages Messages Description User Action avt_init: unable to allocate virtual mem for AVT Indicates a shortage of memory on the local node Check memory utilization and distribution in the kernel tunables. If possible, take crash dump for analysis by product support personnel. (0xF0nnn) internal SAIL logic error detected An internal problem with the SAIL ASIC was detected. The SPA has failed. Run offline diagnostics.
5-24 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Table 5-11 ServerNet I Panic Messages continued Messages Description User Action ship_PCI_initialize: Found n ServerNet I PCI adapters— currently only one ServerNet I PCI adapter supported Indicates that during the discovery and initialization of the SPA, more than one SPA was found Ensure that only one SPA 1.5 revision E is installed in the local node.
Troubleshooting Table 5-11 ServerNet I Panic Messages continued Messages Description User Action ship_init: Unsupported revision of the SAIL ASIC detected CIN=0xnnnnnnnn Indicates that an SPA was found, but the SAIL ASIC on it is not a recognized revision. The driver recognizes revisions A and B of the SAIL ASIC; however, B is the only revision supported by Compaq ProLiant Clusters for SCO UnixWare 7. Replace the SPA with a version containing revision B of SAIL ASIC (SPA 1.5 revision E).
5-26 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Table 5-11 ServerNet I Panic Messages continued Messages Description User Action avt_define_q: invalid interrupt queue size: nnnn These messages are all SPAD (software) errors. If possible, take crash dump for analysis by product support personnel. Reboot the node into the cluster.
Troubleshooting ServerNet I Continuation and Informative Messages The ServerNet I continuation and informative messages are listed in alphabetical order in Table 5-12. Table 5-12 ServerNet I Continuation and Informative Messages Messages Description User Action AVT entry 0xnnnnnnnn @ 0xnnnnnnnn: I/O Address 0xnnnnnnnn, Type = Data AVT entry 0xnnnnnnnn @ 0xnnnnnnnn: I/O Address = 0xnnnnnnnn, Type = Interrupt These are two separate cases of continuation messages.
5-28 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Table 5-12 ServerNet I Continuation and Informative Messages continued Messages Description User Action Dump of Exception Packet @ 0xnnnnnnnn This continuation message is followed by additional information from the packet in question, which was not expected.
Appendix A Software Versions Software versions provided by the Quick Install CDs for the SCO UnixWare 7 NonStop Clusters include: ■ SCO UnixWare 7.1.1 ■ SCO UnixWare 7 NonStop Clusters 7 1.1+IP, PTF nsc1011c, PTF nsc1013a ■ Compaq EFS 7.38a ■ Compaq Management Agents 4.90 ■ System partition created from the Compaq SmartStart and Support Software CD 4.90 Additional software and versions needed include: ■ Compaq SmartStart and Support Software CD 4.
Appendix B Quick Install Planning Worksheets The following worksheets help you to gather and organize the information that you need for the SCO UnixWare 7 NonStop Clusters quick install procedures described in Chapter 3, “Installing Cluster Software,” for the Compaq ProLiant ML370 server. Fill these worksheets out before you begin the software installation and use the data where needed in the procedures.
B-2 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Table B-1 Quick Install Data continued Screen Field Your Information e Node 1 hostname for the cluster interconnect node1-ic Not used for ServerNet I cluster Node 1 IP address for the cluster interconnect 10.1.0.1 Node 2 hostname for the cluster interconnect node2-ic Node 2 IP address for the cluster interconnect 10.1.0.2 Netmask 255.255.255.
Quick Install Planning Worksheets Table B-2 SCO UnixWare License Worksheet Field Your Information Node 1 license number Node 1 license code Node 1 license data (if necessary) NonStop Cluster Two-Node License Node 2 license number Node 2 license code Node 2 license data (if necessary) Compaq Confidential – Need to Know Required Writer: Rachel Williams Project: Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 Comments: Part Number: 221540-001 File Name: h
Glossary CI Serial Cable See Cluster Integrity Serial Cable CLMS See Cluster Membership Service Cluster Integrity Serial Cable The Cluster Integrity (CI) serial cable is a serial cable that connects to a serial port on each node in a two-node cluster. The cable prevents split-brain, a condition that results in both nodes in a two-node cluster trying to operate as the root node.
2 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 CVIP See Cluster Virtual IP Desktop Management Interface Desktop Management Interface (DMI) is an industry framework for managing and keeping track of hardware and software components in a system of personal computers from a central location. DMI See Desktop Management Interface Ethernet Crossover Cable The Ethernet crossover cable provides the node-to-node communication data path for the cluster.
Glossary 3 PCI See Peripheral Component Interconnect Peripheral Component Interconnect Peripheral Component Interconnect (PCI) is an interconnection bus system which provides high speed operation. SAIL See ServerNet Advanced Interface Logic SAN See Storage Area Network ServerNet Advanced Interface Logic ServerNet Advanced Interface Logic (SAIL) converts software requests into ServerNet operations.
4 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 SPA See A ServerNet PCI adapter Split-Brain Split-brain is a condition that results in both nodes in a two-node cluster trying to operate as the root node. The use of the CI serial cable, which is included in this cluster kit, eliminates the possibility of split-brain.
Index A ACU (Array Configuration Utility), defined 1-8 additional information 1-12 agents clusterized 4-8 SNMP 4-4 application software cluster-aware 1-11 Compaq white papers 1-11 resources 1-11 Array Configuration Utility See ACU availability, cluster 1-1 B battery backup, UPS-initialed shutdown 4-10 C cables CI serial 1-4, 2-13 Ethernet crossover 1-3, 2-12 Fibre Channel precautions 2-14 keyboard 2-15 labeling 2-8 monitor 2-15 mouse 2-15 public LAN Ethernet 2-12 ServerNet I interconnect 2-9 UPS 2-15 ca
2 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 CDs Compaq Management 1-9 SmartStart 1-8 checklists, installation B-1 CI (Cluster Integrity), serial cable 1-4, 2-13 client-to-cluster connectivity, problems 5-12 cluster additional setup tasks 3-15 availability 1-1 benefits 1-1 communication 1-3 documentation 1-12, 3-16, 4-2 hardware components 1-2 interconnect 1-3 investment protection 1-1 manageability 1-1 management 4-1 ML370 configuration 1-4 operati
Index EPS (Event Processing Subsystem), defined 4-4 erasing the configuration, procedure 3-3 error messages Quick Install 5-4 ServerNet I SAN 5-15 severity 5-16 Ethernet crossover cable 1-3, 2-12 interconnect 1-3 Event Processing Subsystem See EPS exclamation point symbol viii external storage components 2-7 F FFIU (Fibre Channel Fault Isolation Utility), defined 1-8 Fibre Channel cables 2-14 Fibre Channel Fault Isolation Utility See FFIU file sets, configuration 4-5 firmware troubleshooting 5-3 updating
4 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 information loss, preventing, caution 3-3 information, additional 1-12 informative messages, ServerNet I 5-27 installation checklists B-1 general steps 1-10 problems 5-2 software 3-1, 3-2 installation considerations default Quick Install settings 3-2 internal disk drive 3-2 installing clusters 3-10 GBIC-SW 2-5 HBA 2-5 internal disk drives 2-6 public LAN NIC 2-6 redundant controller 2-8 ServerNet I, interc
Index N NCMS (NonStop Cluster Management Suite) 4-4 network access 1-5 nodes availability reports 4-5 configuring 3-3 installing software 3-11, 3-13 node-to-node communication, problems 5-5 NonStop Cluster Management Suite See NCMS NonStop Clusters documentation, clusters 1-12, 3-16, 4-2 NonStop Clusters Verification Utility See NSCVU notice messages, ServerNet I 5-17 NSCVU (NonStop Clusters Verification Utility) 4-9 defined 3-14 verifying, clusters 1-7 O obtaining licenses 3-2 onnode commands 4-6 operati
6 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 S SAM Viewer (System Availability Monitor Viewer), defined 4-5 scalability, cluster 1-1 SCO clusterized commands 4-5 software 1-6 SCO UnixWare, commands 4-6 SCOadmin 4-3 screwdriver symbol viii server, hardware components 1-2 ServerNet I cable labeling suggestion, illustrated 2-11 connections, verifying 3-7 continuation and informative messages 5-27 interconnect cabling 2-9 installing 2-6 local adapter, v
Index T tables Quick Install Data B-1 Quick Install Default Settings 3-2 Quick Install Error Messages 5-4 SCO UnixWare License Worksheet B-3 ServerNet I Continuation and Informative Messages 5-27 ServerNet I Message Severity 5-16 ServerNet I Message Variables 5-16 ServerNet I Notice Messages 5-17 ServerNet I Panic Messages 5-23 ServerNet I Warning Messages 5-19 Solving Client-to-Cluster Connectivity Problems 5-12 Solving Cluster Resource Problems 5-14 Solving Installation Problems 5-2 Solving Node-to-Node
8 Compaq ProLiant Clusters for SCO UnixWare 7 U/300 Quick Install Guide for the Compaq ProLiant ML370 V W verifying clusters 1-7 local ServerNet I adapter 3-8 node-to-node communication, ServerNet I 3-9 SAN 1-8 ServerNet I connections 3-7 versions, software A-1 warning messages, ServerNet I 5-19 warnings defined viii electric shock viii, ix hazardous energy circuits viii hot surface ix multiple sources of power ix weight ix website, Compaq x weight, warning ix white papers, cluster-related 1-11 workshe