HP XC System Software Release Notes Version 4.
© Copyright 2009 Hewlett-Packard Development Company, L.P. Confidential computer software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license. The information contained herein is subject to change without notice.
Table of Contents 1 New and Changed Features.........................................................................................7 1.1 Base Distribution and Kernel............................................................................................................7 1.2 New Hardware Support....................................................................................................................7 1.3 Upgrade Paths............................................................................
5.5.4 HP ProLiant DL145 G3 Node Imaging Fails When Graphics Cards Are Present..................22 6 Software Upgrades......................................................................................................25 6.1 Upgrade Requires a Minimum of 2 GB Free Space in the root Partition .......................................25 6.2 Remove Older Versions of hp-OpenIPMI.......................................................................................25 6.
Index.................................................................................................................................
List of Tables 1-1 1-2 1-3 2-1 6 Supported Upgrade Paths...............................................................................................................7 SFS Client Packages.......................................................................................................................10 Hardware Supported for Use with SFS.........................................................................................10 Updated SFS Server/client Combinations...................................
1 New and Changed Features This chapter describes the new and changed features delivered in HP XC System Software Version 4.0. 1.1 Base Distribution and Kernel The following table lists information about the base distribution and kernel for this release as compared to the last HP XC release. HP XC Version 4.0 HP XC Version 3.2.1 Enterprise Linux 5 Update 1 Enterprise Linux 4 Update 5 HP XC kernel version 2.6.18-53.1.21.el5.6hp HP XC kernel version 2.6.9-55.9hp.4sp.
1.5 Changes to Internal Node Numbering When Double Density Server Blades Are Present Double density server blades like the HP ProLiant BL2x220c have two separate nodes for each server blade, which means that an HP BladeSystem c7000 enclosure can have a maximum of 32 nodes per enclosure compared to the maximum of 16 single density server blades. A c3000 enclosure can have a maximum of 16 nodes per enclosure compared to the maximum of 8 single density server blades.
The shared storage must contain an LVM volume that includes an ext3 file system type for the dump partition to be mounted locally on one of the nodes and served to the rest of the nodes in the HP XC configuration. The following enhancements were made to the improved availability feature: • • • Improved availability features were upgraded to use Serviceguard Version 11.18 to take advantage of the latest available Serviceguard features. Power management features have been enhanced.
LSF-HPC has become LSF with HPC extensions. 1.12 Changes to Simple Linux Utility for Resource Management A new version of the Simple Linux Utility for Resource Management (SLURM), Version 1.2.25, has been integrated with HP XC. Various changes have been made to SLURM, including the renaming of the SLURM resource limits, JobAcctLoc renamed to JobAcctLogfile, and a change to the JobAcctFrequency parameter. 1.
1.14 Deprecated Features The following features have been deprecated in this release: • Support for the following servers has been dropped: — ProLiant DL140 G1 and G2 — ProLiant DL145 G1 — ProLiant DL360 G1, G2, and G3 — ProLiant DL380 G1, G2, and G3 — ProLiant DL580 G1, G2, and G3 — Integrity rx2600 • • Support for the Voltaire IB Host InfiniBand software stack has been dropped. The netdump utility, which was a supported crash dump tool, is no longer supported for use by the underlying Linux base OS.
2 Important Release Information This chapter contains information that is important to know for this release. 2.1 Firmware Versions The HP XC System Software is tested against specific minimum firmware versions. Follow the instructions in the accompanying hardware documentation to ensure that all hardware components are installed with the latest firmware version. The master firmware tables for this release are available at the following website: http://www.docs.hp.com/en/linuxhpc.
Table 2-1 Updated SFS Server/client Combinations HP XC Version SFS Client Version SFS Server Version Supported Interconnect Version 4.0 SFS G3.0 SFS G3.0 InfiniBand (IB) only Version 4.0 SFS G3.0 SFS G3.1 IB only Version 4.0 Lustre 1.6.5 SFS 2.3 All interconnects SFS G3.2 IB only Version 4.0 with July 2009 SFS G3.2 patch kit The HP XC System Software Installation Guide provides specific instructions for downloading patches.
3 Hardware Preparation Hardware preparation tasks are documented in the HP XC Hardware Preparation Guide. This chapter contains information that was not included in that document at the time of publication. 3.1 FakeRAID Controllers Are Not Supported The HP XC kernel will not boot on ProLiant DL160 G5 nodes configured with Fake Hardware RAID. Hardware RAID setups that are created with the "HP Embedded G5 SATA RAID Controller" found in some DL160 G5 servers are not supported in HP XC.
4 Software Installation on the Head Node This chapter contains notes that apply to the HP XC System Software Kickstart installation session. 4.1 Manual Installation Required for NC510F Driver The unm_nic driver is provided with the HP XC software distribution, however, it does not load correctly. If your system has a NC510F 10 GB Ethernet card, run the following commands to load the driver: # depmod -a # modprobe -v unm_nic Then, edit the /etc/modprobe.
signature: NOKEY, key ID fc69d1a2 Preparing...
5 System Discovery, Configuration, and Imaging This chapter contains information about configuring the system. Notes that describe additional configuration tasks are mandatory and have been organized chronologically. Perform these tasks in the sequence presented in this chapter. The HP XC system configuration procedure is documented in the HP XC System Software Installation Guide.
alias alias alias alias 3. 4. 5. 6. eth0 eth1 eth2 eth3 tg3 tg3 e1000 e1000 Save your changes and exit the text editor. Use the text editor of your choice to edit the /etc/sysconfig/network-scripts/ ifcfg-eth[0,1,2,3] files, and remove the HWADDR line from each file if it is present. If you made changes, save your changes and exit each file. Reload the modules: # modprobe tg3 # modprobe e1000 7.
Therefore, for Serviceguard clusters that do not include the head node, HP recommends that you define a quorum server for quorum instead of using a lock LUN. 5.3.2 HP Scalable File Share Mount Problems with Mixed HCAs NOTE: This note applies only to SFS Version 2.3. A Scalable File Share (SFS) share might not mount properly if the head node and compute nodes have different types of HCA cards. For example, a memfull HCA on the head node and a memfree HCA on the compute nodes, including ConnectX HCAs.
5.5 Notes that Apply to Imaging The notes in this section apply to propagating the golden image to all nodes, which is accomplished when you invoke the startsys command. 5.5.1 Do Not Use File Overrides to Customize inittab and grub.conf Files HP does not recommend using file overrides to customize the /etc/inittab and /boot/ grub/grub.conf files on the specified client nodes because doing so might not work reliably. Instead, HP recommends that you use a custom post-install script to customize these files.
1. 2. 3. 4. Issue the appropriate startsys command and specify one of the DL145 G3 nodes with a graphics card in the [nodelist] option of the startsys command. When power to the node is turned on, use the cluster console to connect to the node and force it to PXE boot by pressing the F12 key at the appropriate time during the BIOS start up. When the node is successfully imaged, repeat this process for the remaining nodes containing graphics cards.
6 Software Upgrades This chapter contains notes about upgrading the HP XC System Software from a previous release to this release. Installation release notes described in Chapter 4 (page 17) and system configuration release notes described in Chapter 5 (page 19) also apply when you upgrade the HP XC System Software from a previous release to this release. Therefore, when performing an upgrade, make sure you also read and follow the instructions in those chapters. 6.
7 System Administration, Management, and Monitoring This chapter contains notes about system administration, management, and monitoring. 7.1 LDAP over SSL Is Not Supported LDAP over SSL to provide Transport Layer Security services is not supported in XC Version 4.0 7.2 Issues with Kernel Dumps Although kexec/kdump can be configured to save kernel dumps to the local disks or across the network to a remote node, only the network dump method is supported in HP XC.
If you do not stop the Nagios service before you run reset_db, the cluster_config utility displays the following warnings: Warnings: Executing C50nagios gconfigure Use of uninitialized value in hash element at /opt/hptc/perl/lib/XCCli.pm line 629. Use of uninitialized value in concatenation (.) or string at /opt/hptc/perl/lib/XCCli.pm line 645. 7.
7.8 Serviceguard Monitors Only Bonded InfiniBand Networks Serviceguard is configured on HP XC to monitor the administration and interconnect networks in the cluster. However, Serviceguard cannot successfully monitor an InfiniBand network unless it is bonded. If you require Serviceguard for Linux to monitor the InfiniBand network on your cluster, it must be bonded. 7.
8 Load Sharing Facility and Job Management This chapter addresses the following topics: • Load Sharing Facility (page 31) • Job Management (page 31) 8.1 Load Sharing Facility This section contains notes about LSF with SLURM on HP XC and standard LSF. 8.1.1 Some Commands Hang When LSF is Down When LSF is down, commands like df and lsof might hang. The hangs are caused because after a job runs, the sbatchd daemon automatically mounts the /net/lsfhost.localdomain/hptc_cluster directory.
9 Cluster Platform 3000 At the time of publication, no release notes are specific to Cluster Platform 3000 systems.
10 Cluster Platform 4000 The notes in this chapter apply to Cluster Platform 4000 systems. 10.1 Some ProLiant DL145 G3 Nodes Hang at 23:59 GMT HP has observed that on a daily basis some DL145 G3 nodes can hang or freeze at a specific time of day (usually at 23:59 GMT). If you observe this behavior, contact the XC Support Team at xc_support@hp.com to obtain a fix for the problem.
11 Cluster Platform 6000 This chapter contains information that applies only to Cluster Platform 6000 systems. 11.1 Finding the IP Address of a Management Processor Because the IP addresses for management processors (MPs) are set statically for this release, if a node must be replaced, you must set the IP address for the MP manually when the node is replaced. To find the IP address, look up the entry for the MP in the /etc/dhcpd.conf file. The MP naming convention for the node is cp-node_name . 11.
12 Interconnects This chapter contains information that applies to the supported interconnect types: • InfiniBand Interconnect (page 39) • Myrinet Interconnect (page 40) • QsNetII Interconnect (page 40) 12.1 Installing the OpenSM Subnet Manager For XC clusters with a single enclosure (a maximum of 32 nodes), follow these steps to install and run OpenSM for subnet management: 1. 2. Obtain the OpenSM RPM from the OFED stack supplied on the XC Version 4.
• During OVP processing, which is using ib_prodmode_mon: Testing infiniband_status ... Searching /etc/hosts for Infiniband switches Getting readings from IR0N00 No Infiniband switches were detected to be the master. Please verify that the switches are accessible, configured and their system time is synchronized with this system. No errors found 12.3 Myrinet Interconnect The following release notes are specific to the Myrinet interconnect. 12.3.
. . . In the previous example, the switch_modules table in the qsnet database is populated with QR0N03 even though the QR0N03 module is not physically present. This problem has been reported to Quadrics, Ltd.
13 Documentation This chapter describes known issues and omissions in the HP XC System Software Documentation Set and HP XC manpages. 13.1 HP XC System Software Administration Guide The notes in this section apply to the HP XC System Software Administration Guide.In Section 11.3.2 “Using File Overrides to the Golden Image", the command in step 6 is incorrect. The correct command is: # ln -sf n8_override.master.0 n8.sh 13.
Index B K base operating system, 7 kernel version, 7 Kickstart installation, 17 C clear_counters command, 40 cluster_config utility, 20 warning during golden image creation, 22 could not count licenses message, 21 CP3000 system, 33 CP4000 system, 35 CP6000 system, 37 SIGUSR2 signal, 40 L D M disabled services enabling, 29 discover command options for controlling node numbering scheme, 8 discover utility, 20 discover.
documentation for, 11 sysctl message on console, 28 system administration notes, 27 system configuration, 19 system management notes, 27 U updateimage command warning during golden image creation, 22 upgrade, 25 supported upgrade paths, 7 upgrade installation, 25 upgrade path, 7 W website ITRC, 13 46 Index