Microsoft Windows HPC Server 2008 Installation Guide HP Part Number: A-HPCS08-1B Published: June 2009
© Copyright 200( Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein. Voltaire and GridStack are trademarks of Voltaire Inc.
Table of Contents About This Document.........................................................................................................7 Intended Audience.................................................................................................................................7 How to Read This Document.................................................................................................................7 Typographic Conventions.........................................................
8 Licensing........................................................................................................................61 9 Troubleshooting............................................................................................................63 9.1 Network Configuration Issues........................................................................................................63 9.2 Provisioning Issues........................................................................................
List of Figures 2-1 2-2 2-3 2-4 2-5 2-6 2-7 Topology 1.....................................................................................................................................14 Topology 2.....................................................................................................................................14 Topology 3.....................................................................................................................................14 Topology 4...................
About This Document This manual describes the implementation of the Microsoft® Windows® HPC Server 2008 (HPCS) operating system on HP Cluster Platform models, including HP Cluster Platform Express. Intended Audience This document is intended for the person who installs, administers, and troubleshoots servers and storage systems. Certain operations described in this document, if performed incorrectly, might cause system crashes and loss of data.
... The preceding element can be repeated an arbitrary number of times. \ Indicates the continuation of a code example. | Separates items in a list of choices. WARNING A warning calls attention to important information that if not understood or followed will result in personal injury or nonrecoverable system problems. CAUTION A caution calls attention to important information that if not understood or followed will result in data loss, data corruption, or damage to hardware or software.
Chapter 9 Troubleshooting. Chapter 10 Technical support. Documentation Updates Documentation updates (if applicable) are provided at http://docs.hp.com. Use the release date of a document to determine if you have the latest version. HP Encourages Your Comments HP encourages your comments concerning this document. HP is committed to providing documentation that meets your needs. Send any errors found, suggestions for improvement, or compliments to: http://docs.hp.com/en/feedback.
1 What's in This Version 1.1 About This Product Windows HPC Server 2008 (HPCS) provides a high-performance computing platform that supports an integrated software stack, which includes the operating system, job scheduler, and management tools. 1.2 Benefits and Features HPCS provides enhanced monitoring, system health, and reporting with built-in diagnostic tools to create and manage groups of resources and failover options.
2 Supported Configurations 2.1 Configuration Constraints The configuration of an HP Cluster Platform is flexible, so that it can support several different operating environments. To support HPCS, a cluster must have the following specific characteristics: • The cluster consists of a single head node and a number of compute nodes. • Only servers (not workstations) are supported as the head node or compute nodes. • The head node requires an optical drive.
Figure 2-1 shows a head node on a public and private network with compute nodes on the private network only. This topology is referred to as Two-Network Topology and is supported by HP Cluster Platform and HP Cluster Platform Express. See Figure 2-6 (page 15). Figure 2-1 Topology 1 Figure 2-2 shows all nodes on a public and private network.
Figure 2-4 Topology 4 Figure 2-5 shows all nodes on a public network only. Figure 2-5 Topology 5 Only two of the five topologies previously shown map to HP Cluster Platform topologies. Figure 2-6 shows the simplest topology, based on in-band (shared) use of the HP ProCurve switch as the MPI fabric. The same network provides routing for cluster administrative and job management traffic. A second network is provided through the site LAN connection.
5 6 7 8 9 Connection to the server management interface, a dedicated hardware interface that listens for, and executes commands received through the Ethernet network. This card is also known as the management processor (MP), or integrated lights-out (iLO) card, depending on the model of server. Where supported by the operating environment, this connection is used for server hardware management functions, such as power-off or boot.
12 13 14 15 InfiniBand MPI network, if present Gigabit Ethernet system interconnect, if present Gigabit Ethernet switch for cluster administrative network, if present InfiniBand system interconnect, if present 2.4 Host Channel Adapters and Firmware Every node that is part of the high-speed InfiniBand fabric requires a host channel adapter (HCA) installed in its PCI bus. The HCA is installed in a specific PCI slot on an unshared bus of the appropriate speed, ensuring system performance.
3 Cabling Your HP Cluster Platform for HPCS For HPCS to install and operate correctly, the GigE network must be cabled properly. The head node must have two connections; one NIC connected to an enterprise (public) connection, and the other NIC connected to an isolated (private) network infastructure. All compute nodes must have the same NIC (NIC1 or NIC2) connected to the same isolated (private) network which includes the head node private NIC.
7. Perform any hardware post-installation verification procedures to ensure that all cabling is functioning. Gigabit Ethernet clusters have no specific post-installation verification procedures, other than those documented in the HP ProCurve hardware documentation shipped with your cluster. 8. Proceed to install or configure the HPCS operating environment, as described in Chapter 4 (page 21).
4 Installing HPCS on a Compute Cluster This chapter provides instructions for installing HPCS on various cluster configurations. 4.1 Cluster Installation Overview You must perform several basic steps to complete the installation of your cluster. However, because of the various HPCS purchase options from HP, these steps might vary slightly for each scenario. NOTE: To install HPCS, you must be logged in as a user with domain administrative privileges.
6. The bottom section contains options to join a Domain or Workgroup. The Workgroup option is currently selected. Select Domain and join the cluster to a domain. NOTE: If you do not have a domain to join, promote the head node to a domain controller and proceed to step 11. For instructions, see Appendix A (page 67). 7. Enter the domain name, a legal domain username, and password. The computer is now allowed to join. 8. Select No when asked to reboot. 9.
14. To start the HPCS installation, click Next. 15. Accept the license terms agreement and click Next. 16. Select Create a new HPC cluster by creating a head node, and click Next. 17. Select the default Create a new database, and click Next. 18. Select the default locations for the HPC database files, and click Next. 4.
19. The next screen prompts you to apply Microsoft updates. IMPORTANT: It is important to apply Microsoft updates to the cluster. Verify that your node can retrieve the desired updates before selecting Yes. If you do not select Yes to updates at this time, you must run Windows Updates at a later time.
22. After the HPC Pack 2008 is installed, the cluster preconfigured information appears in the command window. Let this complete. This process takes an average of 10 to 15 minutes. 4.
23. After the preconfiguration information is loaded, the HPC Console automatically displays. 24. In the To-do List window, Configure your network should have a green check by it. If this is not checked: • The preconfigured network settings are incorrect. • Select Configure your network and make corrections. For additional information, see step 10 in “Installing HPCS from CD”. 25. Select Provide installation credentials. These are domain credentials used to join nodes to the domain.
If any of the four configuration steps are not checked, select the step and complete the configuration. For more information on cluster configuration, see “Installing HPCS from CD” (page 30) , or refer to the Microsoft HPC Pack 2008 documentation. 27. The head node configuration is complete. You can now install the compute nodes. Select Node Management. The cluster compute nodes are preloaded and listed here. 4.
The cluster compute nodes should be listed in a provisioning state. If they are in an unknown state: • Select Add nodes on the right. • Select Deploy compute nodes from bare metal using an Operating System Image, and click Next. • Select Select all, and click Deploy • Select Respond to PXE requests that only come from existing compute nodes, and click Finish. 28. Power on the compute nodes. The compute nodes start to provision.
. After the node has completed provisioning, it displays as offline. Right-click the node or nodes, and select Bring Online. 4.
. If a node fails to provision, choose either option: • Check the provisioning log for the node. This log displays error messages. • Bring up the compute node console, watch the node PXE boot, contact the head node, download the image, and start the OS install. If either of these steps fail, check the error messages to resolve the issue and re-provision the failed node. Your cluster is ready for use. For information on troubleshooting, see Chapter 9 (page 63). 4.
4. Join the head node to an existing enterprise domain. If you cannot join an exisiting domain, promote the head node to a domain controller. For information on promoting your server to a domain controller, see Appendix A (page 67) . 5. Install HPC 2008. a. Run setup.exe. b. Accept terms, and select Create a new HPC cluster by creating a head node. 4.
c. 6. 32 Create New Database instance, and accept defaults. If you have a Windows Server Update Service (WSUS) setup for your active directory, or your servers do not need a Proxy server to directly access Microsoft websites, you can save some time by having updates installed. But if you do not, click I don't want to use Microsoft Update.
7. In Install Required Components, click Install. The installation starts. 4.
8. 34 After the cluster is installed, start the console by clicking Finish.
9. The To-do list window appears. You must complete each of the four configuration steps. Begin with Configure your network. 10. Select the desired network topology for your cluster. For HP Cluster Platform configured hardware, select Compute nodes isolated on a private network. For clusters with InfiniBand, select Compute nodes isolated on private and application networks. 4.
Click Next. a. Select the appropriate ethernet network adapter for the public connection. Click Next. b. Select the appropriate ethernet network adapter for the private connection. Click Next. c. You are prompted to provide private network IP configuration parameters. You can select your own IP ranges, but HP recommends that you accept the defaults. HP also recommends that you select NAT and DHCP. If you do not select DHCP, you will not be able to provision your compute nodes.
For more information on configuring and troubleshooting IB network connections, see the documentation at: http://www.docs.hp.com/en/highperfcomp.html. Click Next. f. Configure your firewall settings. Typically, you select the defaults. Public is firewalled, private is not. We recommend you do not firewall your private network. 4.
Click Next. g. After making all your network selections, review your selections. After verifying your selections, click Configure. This takes several minutes to complete.
11. In the To-do List, select Provide installation credentials. Enter the credentials for a domain user who has permissions to create new nodes in the active directory. (Many enterprises also use a special account with restricted privileges for this purpose. Use the appropriate account for your domain.) Add the user to the cluster user list. 12. Configure the node-naming scheme. Select Configure the naming of new nodes. This presents a pattern to use for compute node names as they join the cluster.
Click Next. b. 40 This image is found on the Windows HPC 2008 Edition OEM disk. Select the first option Create a new operating system image and locate the setup.exe on the OEM DVD. Include a name for the image.
Click OK. c. After the image is added, select the appropriate OS image. If you are using an OEM image with new HP hardware, you do not need to enter a key. The key is automatically encoded into the image. If you are using older hardware, or are using a non-OEM image, you can enter your OS key. 4.
d. The local administrator password can be automatically generated or manually set. HP recommends automatically generated passwords. e. If you have your WSUS server setup, or if your nodes have access to the internet, you can select to include Windows updates . You can also add additional hot-fixes to the installation if you wish.
f. Review the settings, and create the template. Click Create. 14. After the template is created, all the To-do List items have green checks. 4.
15. To finish the configuration and start provisioning nodes, go to the Configuration tab, and select Images.
16. Select Manage Drivers, and add the multi-function NIC drives from the ProLiant Support Pack (PSP) for DL3xx, DL5xx and blXXX. For DL1xx, download the appropriate NIC driver for your hardware, and add this driver to the NIC. IMPORTANT: The WinPE environment for Windows 2008 does not contain drivers for virtual bus drivers. Therefore, the network drivers for your compute nodes must be added to the image, even though they may be a part of the full Windows 2008/2008 HPC Edition operating system.
If you view this and the test All Services Running is the only failure, this is a false error. The HPC services did not start before the diagnostic tests started to run. To clear the error, on the diagnostic tab, select the head node, right click on the error, and select Clear Alert.
Rerun the diagnostic tests if you wish. 18. To start provisioning compute nodes, return to the Node Management tab, and select Add Nodes. Select Deploy compute nodes from bare metal using an operating system image. Click Next. 4.
19. Select the template you want to apply to the compute nodes. You cannot continue until at least one compute node is PXE booted and starts provisioning. Power on at least one of the compute nodes, or all of the compute nodes, and wait for them to display in the Select New Nodes window.
NOTE: Compute nodes should arrive with no data on the hard drive. In this state, the nodes will automatically PXE boot when they are powered up. If your hard drive contains a partition or data, it may attempt to boot from the hard drive. If this occurs, wipe the hard drive clean by deleting all disk partitions, or manually select PXE boot at node startup time. 20. Click Deploy. You are asked if you want the head node to respond only to PXE requests from existing nodes, or respond to all PXE requests.
21. To monitor the provisioning progress, go to the Node Management Console, highlight the node you want to monitor, and select the Provisioning Log tab at the bottom of the screen. This provisioning progress displays, and shows any errors.
Provisioning might take approximately an hour. It might take longer if a large number of nodes are being provisioned at the same time without multi-cast being enabled. After the nodes are provisioned, you can see all the nodes in the node list. Your cluster is ready for use. For information on troubleshooting, see Chapter 9 (page 63). 4.
5 Post-Installation Tasks This chapter provides information about various tasks that might be necessary after installation. 5.1 Altering Regional Settings If the server is not operating under the default regional settings, alter the regional settings. The regional settings control the keyboard language and set the local format for sorting and displaying time, date, numbers, and currency for a specified region. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. Click Start, and select Control Panel.
5. Click Install to install the currently selected components. The HP Management Agents install using the password you have set. 5.3 Configuring Array Controllers If an array controller has been purchased with the server, run the online ACU to set up the remaining physical drives for use. IMPORTANT: Microsoft Internet Explorer is required to run the ACU. Internet Explorer is used to interface with the array controller.
10. Click Next to accept the drive letter assigned by default at the Assign Drive Letter or Path screen. The Format Partition screen appears. 11. To format the drive, select the appropriate file system format (the default selection is NTFS) and the Allocation Unit Size, and then either enter the Volume Label or accept the default label. 12. (Optional) Select Perform a quick format and Enable file and folder compression. 13. If the drive will not be formatted, select Do not format this partition. 14.
6 HP-MPI for Windows HPCS supports HP-MPI for Windows. You must purchase HP-MPI for Windows software and licenses. Install the software locally on each node in your cluster or use the Run command feature of the Compute Cluster Administrator to remotely install it. For more information, see the HP-MPI website: http://www.hp.
7 Upgrading 7.1 Upgrading or Modifying Clusters Factory-integrated clusters are not designed for field upgrades using preinstalled servers, (otherwise known as HPCS standalone servers). To find out whether an upgrade is available for your cluster, contact your HP sales and service representative, or authorized HP reseller. CAUTION: Never add components to an HP Cluster Platform rack, even if available space exists in the rack.
8 Licensing Operating System Activation: If this product was purchased directly from HP, this product is pre-activated. HP has configured the operating system so customer activation is not required. If this product was purchased from your local authorized reseller, you have 60 days from the installation of the product to complete product activation, either online or by phone directly with Microsoft. Please follow activation instructions upon installation of the operating system.
9 Troubleshooting This section covers various areas where problems might occur and offers suggestions for troubleshooting and fixing the issues. 9.1 Network Configuration Issues Cluster configuration problems are often related to improper network configuration. Some areas to check are: • Active Directory—The head node and each compute node must be members of the same domain before the Compute Cluster Pack is installed. • Firewalls—The Windows Firewall can occasionally prevent nodes from being accessed.
• The compute node fails to copy the operating system from the WDS server. When the compute node starts the WinPE environment, a command window displays. You should see it add any drivers (if you injected drivers into the OS), then start the provisioning. One of the early steps is copying the OS image to the compute node. If this copy fails, an error appears in the command window and commands will stop. — Make sure you added the correct network drivers for your compute node into the image.
10 Technical Support 10.1 Before You Contact HP Be sure to have the following information available before you call HP: • Technical support registration number (if applicable) • Product serial number • Product model name and number • Applicable error messages • Add-on boards or hardware • Third-party hardware or software Operating system type and revision level NOTE: HP Cluster Platform and HP Cluster Platform Express are not customer-repairable and come with their own technical support procedures.
A Promoting the Head Node to Domain Controller The HPCS software requires the nodes to be a member of a Windows Active Directory (AD). To install AD, a domain controller is needed to manage the AD. If you do not have an AD which you can join, you must promote your head node to a domain controller. The basic steps to promote your head node to a domain controller are as follows: NOTE: The following process is an overview and assumes the user has some knowledge of Windows administration. 1.
Glossary A AMD Opteron The ProLiant DL series servers used in the CP4000 platform employ the AMD Opteron 32/64-bit CPU. API Application program interface. B BIOS Basic Input/Output System. C Cluster A set of servers joined together using various interconnect technologies to form a supercomputer. Command-Line Interface (CLI) Administrators can use the CLI to automate job, job queue, and node operations. These operations can also be scripted.
I InfiniBand An interconnect protocol and connection type supported by interconnect technology from Voltaire that complies with the specification defined by the InfiniBand trade group. Interconnect Building Block (IBB) A module containing one or more interconnects. J Job Scheduler The Job Scheduler service runs on the head node and is responsible for job submission, queue management, resource allocation, and job execution.
Index A H about this document, 7 administrative network, 13 AMD Opteron, 13 array controllers, 54 audience, 7 HCA, 17 head node, 15, 16 host channel adapter, 17 HP Cluster Platform website, 8 HP-MPI, 57 C cable labels, 19 cabling rules, 19 Certificate of authenticity, 20 comments, reader, 9 compute node, 16 configuration constraints, 13 interconnects supported, 13 pools, 13 supported, 13 supported servers, 13 console network, 13 conventions typographic, 7 CPQTEAM utility, 55 CPU, 13 D DDR, 13 dedicated
N U network administrative, 13 console, 13 in-band, 13 local, 15, 16 management, 16 network configuration, 63 network protocols, 55 network topology three networks, 16 two networks, 15 NIC, 15 node compute, 16 head, 15, 16 standalone servers as, 59 nodes maximum, 13 servers used as, 13 upgrades, 59 upgrading, 59 O Opteron, 13 organization of this book, 8 P PCI, 17 pools, 13 post-installation tasks, 53 providing feedback, 9 R reader comments, 9 website, 9 regional settings, 53 related information, 8 S