Mac OS X Server Xgrid Administration and High Performance Computing For Version 10.
Apple Inc. © 2007 Apple Inc. All rights reserved. The owner or authorized user of a valid copy of Mac OS X Server software may reproduce this publication for the purpose of learning to use such software. No part of this publication may be reproduced or transmitted for commercial purposes, such as selling copies of this publication or for providing paid-for support services. Every effort has been made to ensure that the information in this manual is accurate. Apple Inc.
1 Preface 9 9 9 10 10 11 12 12 13 13 Part I Contents About This Guide What’s New in Xgrid Administration What’s in This Guide Using This Guide Using Onscreen Help Advanced Server Administration Guides Viewing PDF Guides on Screen Printing PDF Guides Getting Documentation Updates Getting Additional Information Xgrid Administration Chapter 1 17 17 18 20 20 21 21 22 23 23 24 24 24 Introducing Xgrid Service About Xgrid and Computational Grids How Xgrid Works Common Types of Grids and Grid Computing Sty
Chapter 3 4 27 27 28 28 28 29 29 30 30 30 31 32 33 34 34 34 35 35 36 37 37 37 38 Password-Based Authentication No Authentication Hosting the Grid Controller Turning Xgrid Service On Configuring Xgrid with the Xgrid Service Configuration Assistant Configuring Xgrid to Host a Grid Using the Xgrid Service Configuration Assistant Configuring Xgrid to Join a Grid Using Xgrid Service Configuration Assistant Setting Up Xgrid Service Xgrid and Multiple Network Interfaces Configuring Controller Settings Starting
46 Monitoring Grid Activity Chapter 4 47 47 47 48 48 48 49 49 Planning and Submitting Xgrid Jobs Structuring Jobs for Xgrid About Job Styles About Job Failure Submitting a Job Examples of Xgrid Job Submission and Results Retrieval Viewing Job Status Retrieving Job Results Chapter 5 51 51 51 52 52 53 53 53 53 54 55 Solving Xgrid Problems If Your Agents Can’t Connect to the Xgrid Controller If You Use Xgrid over SSH If You Run Tasks on Multi-CPU Machines If You Submit a Large Number of Jobs If You Want
72 73 Private Network Requirements Static IP Address and Hostname Requirements Chapter 9 75 75 78 Preparing the Cluster for Configuration Preparing the Cluster Nodes for Software Configuration (Optional) Setting Up the Management Computer Chapter 10 81 81 84 85 86 86 87 88 90 90 90 91 92 93 94 Setting Up the Cluster Controller Setting Up Server Software on the Cluster Controller Configuring DNS Service Verifying DNS Settings Configuring Open Directory Service Configuring the Cluster Controller as
114 Glossary 115 Index 121 Using SSH Without Passwords Contents 7
Contents
Preface About This Guide This guide describes the Xgrid components included in Mac OS X Server and tells you how to configure and use them in computational grids. Xgrid in Mac OS X Server version 10.5 includes a controller for computational grids and an agent that allows the server’s processor to work on jobs submitted to a grid. The agent is also available in computers using Mac OS X v10.3 or v10.4.
Using This Guide The following list contains suggestions for using this guide: Â Read the guide in its entirety. Subsequent sections might build on information and recommendations discussed in prior sections. Â The instructions in this guide should always be tested in a nonoperational environment before deployment. This nonoperational environment should simulate, as much as possible, the environment where the computer will be deployed.
Advanced Server Administration Guides Getting Started covers basic installation and initial setup methods for a standard, workgroup, or advanced configuration of Leopard Server. An advanced guide, Server Administration, covers advanced planning, installation, setup, and more. A suite of additional guides, listed below, covers advanced planning, setup, and management of individual services. You can get these guides in PDF format from the Mac OS X Server documentation website at www.apple.
This guide ... tells you how to: Web Technologies Administration Set up and manage web technologies, including web, blog, webmail, wiki, MySQL, PHP, Ruby on Rails, and WebDAV. Xgrid Administration and High Performance Computing Set up and manage computational clusters of Xserve systems and Mac computers. Mac OS X Server Glossary Learn about terms used for server and storage products.
Getting Documentation Updates Periodically, Apple posts revised help pages and new editions of guides. Some revised help pages update the latest editions of the guides. Â To view new onscreen help topics for a server application, make sure your server or administrator computer is connected to the Internet and click “Latest help topics” or “Staying current” in the main help page for the application. Â To download the latest guides in PDF format, go to the Mac OS X Server documentation website: www.apple.
Preface About This Guide
Part I: Xgrid Administration I Use the chapters in this part of the guide to learn about Xgrid service and the applications and tools available for administering Xgrid.
1 Introducing Xgrid Service 1 Use this chapter to learn about what Xgrid is and how it can help you. You use Xgrid to create grids of multiple computers and distribute complex jobs among them for high-throughput computing. Xgrid, a technology in Mac OS X Server and Mac OS X, simplifies deployment and management of computational grids.
How Xgrid Works Xgrid creates multiple tasks for each job and distributes those tasks among multiple nodes. These nodes can be desktop computers running Mac OS X v10.3 or later, or server computers running Mac OS X Server v10.4 or later. Many desktop computers sit idle during the day, in evenings, and on weekends. The assembly of these systems into a computational grid is known as desktop recovery.
The illustration below gives an example of how a grid handles a job. Distributed agents 1 Client submits job to Controller 2 Controller splits job into tasks, then submits tasks to Agents 3 Agents execute tasks Dedicated Desktop Controller Client Dedicated Server 5 Controller collects tasks and returns job results to Client 4 Agents return tasks to Controller Part-Time Desktop Xgrid has no limitations on the amount of computational power it can support.
You don’t need to think in terms of thousands or millions of seldom-used computers to see the significance of a computational grid. For example, computers used by university students or corporate employees often work fewer hours than the hours they sit idle at night or on weekends. These computers could contribute productively to the work of a grid without diminishing their usefulness to the students or employees.
Xgrid enables administrators to easily configure the distributed resource management functionality of the cluster. Each server in the system runs the agent software, and the head node in the cluster runs the controller software. Xgrid distributes tasks across the cluster. In clusters, failure rates are generally very low. Systems are rarely, if ever, offline, and their resources are not shared with general user tasks. Clusters are the most efficient but most expensive model of distributed computing.
Xgrid Components The Xgrid three-tier architecture simplifies the distribution of complicated tasks. Its user clients, grid controllers, and computational agents work together to streamline the process of assembling nodes, submitting jobs, and retrieving results. The illustration below gives an example of the Xgrid components and the process of auto configuration for a grid.
Agent Xgrid agents run the computational tasks of a job. In Mac OS X Server, the agent is turned off by default. When an agent is turned on and becomes active at startup, it registers with a controller. (An agent can be connected to only one controller at a time.) The controller sends instructions and data to the agent as needed for the controller’s jobs. After it receives instructions from the controller, the agent performs its assigned tasks and sends the results back to the controller.
Controller The Xgrid controller manages the communications among the computational resources of a grid. The controller requires Mac OS X Server v10.4 or later. The controller accepts network connections from clients and agents. It receives job submissions from clients, divides the jobs into tasks, dispatches tasks to agents, and returns results to the clients. Although there can be more than one Xgrid controller running on a subnet, there can only be one controller per logical grid.
2 Setting Up and Configuring Xgrid Service 2 Use this chapter to plan your grid and set up the Xgrid agent and controller. Xgrid simplifies deployment and management of computational grids. Using Server Admin you can configure Xgrid to set up computer groups (grids or clusters) and allow users to easily submit complex computations to these grids (local, remote, or both), as either an ad hoc grid or a centrally managed cluster.
Step 6: Configure Xgrid agent settings (Mac OS X Server) Configure your server as an Xgrid agent using Server Admin. See “Configuring an Xgrid Agent (Mac OS X Server)” on page 32. Step 7: Configuring Xgrid agent settings (Mac OS X) Configure computers as Xgrid agents by using Sharing Preferences. See “Configuring an Xgrid Agent (Mac OS X)” on page 33. Before Setting Up Xgrid Service Before configuring Xgrid service, you must define the grid environment you’ll create.
 Clients. If your server is the controller for a grid, be sure that Mac OS X and Mac OS X Server clients use the correct authentication method for the controller. A client cannot submit a job to the controller unless the user chooses the correct authentication method and enters their password correctly, or has the correct ticketgranting ticket from Kerberos. For more information, see “Setting Up Grid Authentication” on page 34.
Otherwise, do not use this option. It creates a potential security hole (because anyone can connect or run a job) and should never be used on a system exposed to the Internet, especially when potentially sensitive data is involved. If you choose to use no authentication, agents can join the grid and clients can submit jobs to the grid without authenticating. Hosting the Grid Controller The primary requirement for a controller is that it must be network-accessible to clients and agents.
Configuring Xgrid to Host a Grid Using the Xgrid Service Configuration Assistant Use the Xgrid service configuration assistant to configure the Xgrid agent and controller to run on this server. This also configures a network file system. To set up Xgrid to host a grid using the Xgrid service configuration assistant: 1 Open Server Admin and connect to the server. 2 Click the triangle to the left of the server. The list of services appears. 3 In the expanded Servers list, click Xgrid. 4 Click Overview.
8 Specify the controller you want to bind your agent to. Select “Browse Bonjour-discoverable controllers” to view and select from available controllers. Select “Use controller with hostname” to enter the hostname of a specific controller. 9 Click Continue. 10 Review and confirm your configuration settings, then click Continue. This restarts Xgrid service using your settings. 11 Click Close.
4 Click Settings. 5 Click Controller. 6 Click “Enable controller service.” 7 From the Client Authentication pop-up menu, choose one of the following authentication options for clients and enter the password. Â Password requires that the agent and controller use the same password. Â Kerberos uses SSO authentication for the agent’s administrator. Â None does not require a password for the agent. This option provides no protection from potentially malicious use of your grid.
2 Click the triangle to the left of the server. The list of services appears. 3 In the expanded Servers list, click Xgrid. 4 Click the Start Xgrid button (below the Servers list). Configuring an Xgrid Agent (Mac OS X Server) You use Server Admin to set up your server as an Xgrid agent. In addition, you can associate the agent with a specific controller or permit it to join a grid, specify when the agent accepts tasks, and set a password that the controller must recognize.
10 Click Save. Important: If you require authentication, the agent and controller must use the same password or must authenticate using Kerberos SSO. For details about authentication option, see “Setting Up Grid Authentication” on page 34. Configuring an Xgrid Agent (Mac OS X) You use Sharing preferences to set up client computers as Xgrid agents.
Setting Up Grid Authentication You can configure Xgrid to require authentication of controllers, clients, and agents. For more information, see “Authentication Methods for Xgrid” on page 26. Setting Up Kerberos for Xgrid You use Server Admin to configure Kerberos as the authentication method for your Xgrid. Kerberos authentication uses SSO. To configure Kerberos authentication: 1 Open Server Admin and connect to the server. 2 Click the triangle to the left of the server. The list of services appears.
3 In the expanded Servers list, click Xgrid. 4 Click Settings. 5 Click Agent. 6 Click “Enable agent service.” 7 For the authentication option for the agent, choose Password from the Controller Authentication pop-up menu and enter a password. 8 Click Controller. 9 Click “Enable controller service.” 10 For the authentication option for the client, choose Password from the Client Authentication pop-up menu and enter a password.
To restrict access to all services, select “For all services.” To set access permissions for individual services, select “For selected services below,” then select a service from the Service list. 6 To provide unrestricted access to services, click “Allow all users and groups.” 7 To restrict access to users and groups: a Select “Allow only users and groups below.” b Click the Add (+) button to open the Users and Groups drawer. c Drag users and groups from the Users and Groups drawer to the list.
Managing Xgrid Service This section describes typical day-to-day tasks you might perform after you set up Xgrid service on your server. For information about initial setup, see “Setting Up Xgrid Service” on page 30. You can monitor and manage grids using Xgrid Admin. For more information, see Chapter 3, “Managing a Grid.” Viewing Xgrid Service Status You can use Server Admin to view the status of Xgrid service. To view Xgrid service status: 1 Open Server Admin and connect to the server.
Stopping Xgrid Service You use Server Admin to stop Xgrid service. To stop Xgrid service: 1 Open Server Admin and connect to the server. 2 Click the triangle to the left of the server. The list of services appears. 3 From the expanded Servers list, select Xgrid. 4 Click the Stop Xgrid button (below the Servers list). From the Command Line You can also stop Xgrid service immediately by using the serveradmin command in Terminal.
3 Managing a Grid 3 Use this chapter to learn how to use the Xgrid Admin application to manage grids, add controllers and agents, and work with jobs. After you set up an Xgrid controller, you can use Xgrid Admin to manage a grid. You can use Xgrid Admin on the server or on a remote computer that is running Mac OS X v10.4 or later. You can manage one or more computational grids with Xgrid Admin. A computational grid is a fixed group of agents with a dedicated queue.
Xgrid Admin provides controls in its graphical interface and menu commands for all of its options. Note: You can also use the Xgrid command-line tool to perform these tasks. For more information about using the command-line tool, see Chapter 4, “Planning and Submitting Xgrid Jobs.” Status Indicators in Xgrid Admin Xgrid Admin provides status indicators, which are small color bubbles indicating the status of controllers, agents, and jobs.
To connect to an Xgrid controller: 1 Open Xgrid Admin and do one of the following: Â From the pop-up menu, choose the controller or enter its name and click Connect. Â In the Controllers and Grids list, select the controller name and click Connect. 2 If necessary, select the correct authentication option, enter a password, and then click OK. Disconnecting from an Xgrid Controller You use Xgrid Admin to disconnect froman Xgrid controller in the Controllers and Grids list.
Managing Agents Use Xgrid Admin to view, add, or delete agents. Xgrid Admin also uses status indicators to display the status of agents. Although Server Admin provides a simple interface for enabling Xgrid services on one server or across a rack of Xserve systems, it doesn’t provide a way to configure Xgrid on desktop computers running Mac OS X v10.3 or later. If you are relying on volunteers to provide desktop agents, you can send instructions for enabling Xgrid from the Sharing pane of System Preferences.
NetBoot or Network Install For large networks, it often makes sense to use a common system image that is mounted or installed by each agent to configure the agents. Although Xgrid isn’t reason enough to use NetBoot, consider whether using Network Install would simplify your general administrator’s tasks. If you use Netboot with Xgrid, all agents must have unique hostnames and must keep all files intact between reboots. For more information, see System Imaging and Software Update Administration at www.apple.
Deleting an Agent You can delete an agent for an Xgrid controller in Xgrid Admin. To delete an agent: 1 Open Xgrid Admin. 2 In the Controllers and Grids list, select the controller. 3 Click Agents. 4 Click the Delete (–) button below the list of agents. Note: If you delete an agent that you know is on the local subnet and is configured to attach to that controller, wait a few moments and it will reappear in the list. If the agent doesn’t reappear, use the Add (+) button and enter its name to retrieve it.
Repeating or Restarting a Job You can repeat a job or restart a stopped job in Xgrid Admin. To repeat or restart a job: 1 Open Xgrid Admin. 2 In the Controllers and Grids list, select the controller. 3 Click Jobs. 4 Select the job you want to repeat or restart. 5 Click the Start button below the list of jobs. Deleting a Job You can delete a job in Xgrid Admin. To delete a job: 1 Open Xgrid Admin. 2 In the Controllers and Grids list, select the controller. 3 Click Jobs. 4 Select the job you want to delete.
Deleting a Grid You use Xgrid Admin to remove a grid from an Xgrid controller in the Controllers and Grids list. To delete a grid: 1 Open Xgrid Admin. 2 In the Controllers and Grids list, select the grid. 3 Click the Action pop-up menu below the Controller and Grids list and select Remove Grid. 4 Click OK. Monitoring Grid Activity You can quickly view the activity of a grid in Xgrid Admin. You can also view agents and job activity using Xgrid Admin.
4 Planning and Submitting Xgrid Jobs 4 Use this chapter to learn how to use Xgrid command-line tools and the Terminal application to submit jobs to a grid and to get information about jobs. After you configure an Xgrid controller and add agents to a grid, you can use the Terminal application to send a job to the grid. Structuring Jobs for Xgrid Carefully planning and structuring a job can result in efficient use of the grid.
About Job Failure Xgrid jobs can rely on message-passing interface (MPI) APIs. For jobs that rely on MPI, if a single task fails, the entire job fails and must be resubmitted. Therefore you should not use MPI-based jobs on grids with high task-failure rates. Jobs that are more parallel in nature are generally unaffected by occasional task failures. Tasks are typically reassigned to other available agents to complete the job. Most jobs fall into this category.
The following command copies the shell script hello.sh to the Xgrid controller and agent systems and runs the script. /bin/echo must be installed on the agent system. The hello.sh script must have its executable bit set before it can execute. xgrid -h -p -job submit hello.sh Viewing Job Status You can monitor jobs in Xgrid Admin (for details, see “Managing Jobs” on page 44) or with the command-line tool.
Chapter 4 Planning and Submitting Xgrid Jobs
5 Solving Xgrid Problems 5 Use this chapter to help solve common problems you might encounter and questions you might have while working with Xgrid service. This section contains answers to common problems and questions. If Your Agents Can’t Connect to the Xgrid Controller If an agent is a server, make sure the agent service is enabled and the Xgrid service is started. The Xgrid controller is the only component of Xgrid that has an open port (port 4111) and requires a firewall opening.
To run an Xgrid agent over an SSH tunnel as a particular user: Using Terminal, enter the following: $ ssh -R 20000:192.168.1.100:4111 user@192.168.1.102 /usr/libexec/xgrid/ GridAgent -ServiceName localhost:20000 -RequireControllerPassword NO UsesRendezvous NO -OnlyWhenIdle NO -BindToFirstAvailable NO is the port to tunnel through the ssh connection, 192.168.1.100:4111 is the address and port number of the controller, user is the name of the user to connect, and 192.168.1.
If You Want to Use Xgrid on Other Platforms Third-party agents are available that run Xgrid jobs on non-Mac platforms. You are responsible for ensuring that your tasks contain and call appropriate platform-specific code. There is no intrinsic support for heterogeneous execution, although there is nothing that relies on Mac-specific technology. The primary technical requirement is a sufficiently functional BEEP protocol stack. Several open source implementations are available, of varying quality.
If You Want to Enable Kerberos/SSO for Xgrid For Xgrid to use SSO, you need the following: Â The agent must have the host’s user principal in the system keytab. Â The Kerberos database on the KDC must contain the agent’s principal. Â The controller’s realm must be the default realm on the agent computer. The agent’s principal is created in the KDC and is put in the agent’s keytab if the agent computer is bound to the OD master using _AUTHENTICATED BINDING_ with Directory access.
The controller’s realm must be the default realm on the agent computer, as shown: $ cat /Library/Preferences/edu.mit.Kerberos # WARNING This file is automatically created, if you wish to make changes # delete the next two lines # autogenerated from : /LDAPv3/xgridtest.apple.com # generation_id : 1637891359 [libdefaults] default_realm = XGRIDTEST.APPLE.COM [realms] XGRIDTEST.APPLE.COM = { kdc = xgridtest.apple.com admin_server = xgridtest.apple.com } [domain_realm] apple.com = XGRIDTEST.APPLE.COM .apple.
Chapter 5 Solving Xgrid Problems
Part II: Configuring High Performance Computing II Use the chapters in this part of the guide to learn about high performance computing and the applications and tools available for administering it.
6 Introducing High Performance Computing 6 Use this chapter to learn about high performance computing (HPC) and how it’s supported by Apple technology. With high performance computing, you can speed the processing of complex computations by using Xserve computers with the Xgrid service. Understanding HPC HPC refers to the use of high-end computer systems to solve computationally intensive problems.
Mac OS X Server Mac OS X Server v10.5 is Apple’s award-winning UNIX server operating system. Mac OS X Server can compile and run UNIX 03-complaint code, and runs 64-bit applications alongside 32-bit applications at native performance. The Mach kernel provides preemptive multitasking for outstanding performance, protected system memory for stability, and modern SMP locking for efficient use of multi processor and multi core systems.
Memory Space The 64-bit architecture provides four billion times the memory space available in a 32bit architecture, which puts the theoretical address space available to Mac OS X Server applications at 16 exabytes. Xserve G5 systems support 8 GB of memory. Xserve Intel systems support 32 GB of memory. Libraries Mac OS X Server provides the following highly optimized libraries for developing HPC applications.
Support of Loosely Coupled Computations You can use Xserve clusters to perform most types of loosely coupled or embarrassingly parallel computations. Embarrassingly parallel computations consist of somewhat independent computational tasks that can be run in parallel on many different processors to achieve faster results. Here are examples of loosely coupled computations that you can accelerate using the setup described in this guide: Â Image rendering.
7 Reviewing the Cluster Setup Process 7 Use this chapter to learn about the process of setting up a high performance cluster. You will use multiple server tools to configure services, a cluster controller, compute nodes, and users when setting up a high performance cluster. The following chapters provide a step-by-step process to assemble and configure a computational cluster. The resulting cluster will consist of a controller and a number of compute nodes.
Cluster Setup Overview Here is a summary of what you’ll be doing to set up and test an HPC cluster. Step 1: Before you begin Before setting up your cluster, understand the expectations and requirements that you must fulfill. See Chapter 8, “Identifying Prerequisites and System Requirements.” Step 2: Prepare the cluster for configuration Prepare your cluster nodes for configuration by setting up the hardware and connecting your nodes to a network. See Chapter 9, “Preparing the Cluster for Configuration.
Step 7: Create an Auto Server Setup record for the compute nodes Use Server Assistant to save configuration settings to a file or Open Directory record. This allows cluster nodes to automatically configure themselves when they start up for the first time. See “Creating an Auto Server Setup Record for Compute Nodes” on page 95 and “Verifying LDAP Record Creation” on page 98. Step 8: Set up compute nodes Start compute nodes to begin the Auto Server Setup process.
Chapter 7 Reviewing the Cluster Setup Process
8 Identifying Prerequisites and System Requirements 8 Before setting up your cluster, read the prerequisites and requirements in this chapter and familiarize yourself with the setup process. To make sure that your cluster is successfully set up, read this chapter to familiarize yourself with the expectations and requirements you must meet before starting the setup procedure. Then read the last section, which provides an overview of the cluster setup process.
System Requirements Take time to define the requirements needed to make sure the cluster setup is successful. System requirements are categorized as infrastructure, software, and private network requirements. Infrastructure Requirements This section describes the most important hardware infrastructure requirements. Consult with your system administrator about other requirements. For example, you might need one or more uninterruptible power supplies (UPSs) to provide backup power to key cluster components.
To obtain power consumption figures for cluster nodes, see the following articles on the AppleCare Service & Support website: Â Article 86694, “Xserve G5: Power consumption and thermal output (BTU) information,” at www.info.apple.com/kbnum/n86694 Â Article 75383, “Xserve: Power Consumption and Thermal Output (BTU) Information,” at www.info.apple.com/kbnum/n75383 Â Article 86251, “Xserve (Slot Load): Power Consumption and Thermal Output (BTU) Information,” at www.info.apple.
Consider the thermal output of other devices, such as the management computer, Xserve RAID systems, monitors, and other heat-generating devices used in the same room. As always, consult with your system administrator to determine the necessary level of cooling that your cluster and its associated hardware require for safe and effective operation. Weight Requirements For Xserve and cluster node weight information, see the Apple Xserve website at www.apple.com/xserve.
If you’re housing your cluster in a computer room, make sure you have at least 18 inches of clearance in front and behind your systems. If you’re housing it in an office or other unmanaged space, make sure your cluster has at least 18 inches of clearance on all sides of the rack, as shown in the following illustration: 18” 18” 18” 18” You should have enough space to open the rack’s door, slide out systems, and perform other routine maintenance tasks.
Software Requirements You need: Â A site-licensed copy of Mac OS X Server v10.5 or later. Â One or more copies of Apple Remote Desktop v3 or later (recommended). Â The latest version of Server Tools. Volume-Licensed Serial Number To run multiple copies of Mac OS X Server, you should obtain a volume-licensed serial number. If you haven’t obtained a volume-license serial number, contact your local Apple sales representative.
 Addresses in ranges such as 192.168.x.x, 10.0.x.x, and 172.16.x.x are commonly used for private networks. Because the first two are used more commonly with NAT devices used in the home, and because your users may want to connect to your cluster from behind one of these devices, it is best to choose a range less likely to exist on your user’s networks. This guide uses the range 172.16.1.1 - 172.16.1.254 (subnet mask 255.255.255.0).
Chapter 8 Identifying Prerequisites and System Requirements
9 Preparing the Cluster for Configuration 9 Use this chapter to mount the systems on the rack, connect the systems to a power source and the private network, and configure the optional management computer. To prepare the cluster nodes for configuration, you mount them in racks and connect them to the power source and private network. You also set up the management computer by installing Apple Remote Desktop and Server Tools.
You can find the serial number of an Xserve computer in four places:  The unit’s back panel: Serial number label  The unit’s interior If you look for the serial number on the unit’s interior, don’t confuse the serial number for the server with the serial number for the optical drive—these are different numbers. The Xserve computer’s serial number is denoted by “Serial#” (not “S/N”) followed by 11 characters.
 UPS connection to wall outlet. Make sure the electrical outlets support the UPS plug shape.  Power cord retainer clips. To prevent power cables from slipping out, use the power cord retainer clips that come with your Xserve systems.  Air flow. Don’t permit a mass of power cables to obstruct air flow. 4 Connect the two Ethernet ports (shown in the illustration below) by connecting port 1 on the cluster controller to the public network and port 2 to the private network.
(Optional) Setting Up the Management Computer You can use the management computer to remotely set up, configure, and administer your cluster. To set up the management computer: 1 Connect the management computer to the private network (as shown) using the second-to-last switch port. Optional Management computer Private Network 2 Start the management computer. 3 Disable AirPort and any network connection other than the one you’ll be using to connect to your private network.
If you are adopting the IP address range that is used in this guide (172.16.1.1 - 172.16.1.199 for compute nodes, 172.16.1.254 for the controllers), you can configure your management computer to use 172.16.1.253.
Chapter 9 Preparing the Cluster for Configuration
10 Setting Up the Cluster Controller 10 Use this chapter to set up server software on the cluster controller and configure the services running on it. You use Server Assistant, Server Admin, and Apple Remote Desktop (optional) to set up and configure the cluster controller. Setting Up Server Software on the Cluster Controller To set up the cluster controller, use Server Assistant (located in /Applications/Server/). To set up the cluster controller: 1 Start the cluster controller.
5 In the Serial Number screen: a Enter a volume license Mac OS X Server serial number. b Click Continue. 6 In the Registration Information screen, fill out the form or press Command-Q and click Skip. 7 In the Administrator Account screen: a Create the user account you’ll use to administer the cluster controller (for example, Administrator). b Click Continue. 8 In the Network Address screen: a Choose “No, configure network settings manually.” b Click Continue.
e f g h i Leave the DNS Servers field blank. Leave the Search Domains field blank. Click Configure IPv6. From the Configure IPv6 pop-up menu, choose Off. Click OK, then click Continue. 12 In the Network Names screen: a Enter the primary DNS name and computer name. The cluster controller has a public and a private DNS name. Use the controller’s private names. For example, use controller.cluster for the primary DNS name and controller for the computer name.
Configuring DNS Service Use Server Admin on the cluster controller to create a local DNS zone and add records to map cluster nodes to their corresponding IP addresses. To configure DNS service: 1 Open Server Admin if it is not already open. 2 If necessary, click the triangle to the left of the controller to view a list of services. 3 Click DNS in the expanded Servers list. 4 Click Settings.
22 Click the Start DNS button (below the Servers list). The DNS service status indicator turns green when the service starts. 23 From the Apple Menu open System Preferences (/Applications/System Preferences). 24 Click Network. 25 Select the Ethernet 1 interface. 26 In the DNS Server field enter the public IP address of the controller (for example, 10.0.2.199). 27 In the Search Domains field enter the private DNS domain (for example, cluster). 28 Click Apply. 29 Quit System Preferences.
If any DNS lookups do not match, repeat the process to create the DNS zone and entry for the controller. Do not continue the cluster setup process until DNS resolves correctly. 6 Quit Terminal. Configuring Open Directory Service The Open Directory service is responsible for authenticating users, publishing server setup configurations, and publishing network share automount records.
Note: You can click Logs and monitor the log file /Library/Logs/slapconfig.log for errors while the Open Directory domain is being created. You can also use the Console (located in /Applications/Utilities/) or Terminal with the command “tail -f/Library/Logs/ slapconfig.log.” In the log, warnings such as the following can be ignored: WARNING: no policy specified for [...] defaulting to no policy After the Open Directory domain is created, the Open Directory service starts and the status icon turns green.
8 Click DNS below the Subnets list. 9 In the DNS Servers field, enter the public address of the cluster controller (for example, 10.0.2.199). 10 In the Default Search Domain field, enter the DNS domain for your private network (for example, cluster). 11 Click Save. 12 Click LDAP. 13 In the Server Name field, enter the fully qualified DNS name of the cluster controller (for example, controller.cluster).
For a subnet mask of 255.255.255.0, use “/24” after the network address (for example, 10.0.2.0/24). 7 Verify that the address range for the list accurately describes the address range used by your public network. 8 Click OK. 9 Click the Add (+) button to add another IP address group. 10 In the “Group name” field, name the group with your private DNS domain name (for example, cluster). 11 In the “Addresses in group” field, change the first entry to match your private IP network in CIDR notation.
25 Click the Start Firewall button (below the Servers list). Configuring NAT Settings on the Cluster Controller Network Address Translation (NAT) allows compute nodes to share the controller’s connection to the public network. To configure NAT: 1 In the controller’s list of services, click NAT. 2 Click Settings, then verify that IP Forwarding and Network Address Translation (NAT) is selected.
4 In the Starting IP address field, enter the first private IP address you want to assign to remote VPN clients (for example, 172.16.1.200). 5 In the Ending IP address field, enter the last private IP address you want to assign to remote VPN clients (for example, 172.16.1.229). 6 Click Save. 7 Click the Start VPN button (below the Servers list). Configuring Xgrid Service Using Server Admin on the cluster controller, configure it as an Xgrid controller and then start Xgrid service.
Preparing the Data Drive as a Mirrored RAID set When preparing your data drive you should protect your data by using a mirrored RAID set, also referred to as RAID 1. You can use the Disk Utility application to create the mirrored RAID set. To create a mirrored RAID set you must have two or more disks. Note: Your network share points should be located on a different drive than your operating system, ideally on a mirrored RAID set.
Creating a Home Directory Automount Share Point Use Server Admin to configure an automount share point on the cluster controller. To create an automount home directory share point: 1 Open Server Admin and select the controller in the Servers list. 2 Click File Sharing, then click Volumes. 3 Select the volume you want to contain the home directory share point (for example, Data). 4 Click Browse. 5 Click New Folder, name the folder “home,” then click Create. 6 Click Save. 7 Select the home folder you created.
Creating User Accounts Use Workgroup Manager to create user accounts. To create user accounts: 1 If you did not restart the cluster controller at the end of the previous section (“Creating a Home Directory Automount Share Point” on page 93), restart it now. 2 Log in using your administrator account. 3 Open Workgroup Manager (located at /Applications/Server/). You can also open Workgroup Manager from the Dock.
11 Setting Up Compute Nodes 11 Simplify the compute node setup process by creating Auto Server Setup records. An Auto Server Setup record is an XML property list with values that can be used to automatically complete the Server Assistant for newly installed Mac OS X servers. Auto Server Setup records can be accessed using external storage (for example a CD, USB drive, or iPod) or over a network using Open Directory.
5 In the Serial Number screen: a Enter a site-licensed Mac OS X Server serial number. Note: If you don’t have a site-licensed number you must manually enter unique serial numbers for each compute node after it has been configured. b Click Continue. 6 In the Administrator Account screen: a Create the account you’ll use to administer compute nodes. b Click Continue. 7 In the Network Interfaces screen: a b c d e f Click Add. In the Port Name field, enter “Ethernet 1.
11 In the Directory Usage screen: a From the “Set directory usage to” pop-up menu, choose “Connected to a Directory System”. b From the Connect pop-up menu, choose “Open Directory Server.” c In the IP Address or DNS Name field, enter the private DNS name of the cluster controller (for example, controller.cluster). d Click Continue. 12 In the Confirm Settings screen: a Read the configuration summary to confirm that you have made the correct settings. b Click Save As.
Verifying LDAP Record Creation To verify the creation of the LDAP directory record that will be used by compute nodes to autoconfigure, use the slapcat command on the cluster controller. To verify the LDAP record creation: 1 Open a Terminal window on the cluster controller and enter the following command: $ sudo slapcat | grep generic 2 When prompted enter the administrator password . This command displays the generic records in the LDAP database on the cluster controller.
If an Auto Server Setup record is available to the compute node through a removable drive or Open Directory record, it will configure itself and reboot. After you verify that the first node has completed this process, start the remaining compute nodes sequentially, allowing time for them to obtain sequential IP addresses from the DHCP server and for autoconfiguration. Do not disconnect or remove disks until you are sure the server has applied the settings.
12 Click Save. 13 Repeat steps 2 through 12 for each compute node. You can also use Apple Remote Desktop to set the names of all cluster nodes at once. For more information, see “Naming Multiple Cluster Nodes” on page 111. 14 Select the node’s Open Directory service. 15 Click Settings, then click General. 16 Verify the role is set to “Connected to a Directory System.” 17 Click Join Kerberos. A Join Kerberos Realm screen appears. Set the realm to your Kerberos realm (for example, CONTROLLER.CLUSTER).
Creating and Verifying a VPN Connection Remote clients can connect to the private network of the cluster securely using SSH and VPN. VPN access allows graphical applications (like the GridMandelbrot sample Xgrid application) to run on remote systems, but use the cluster for computation. VPN access also allows administrative tools, such as Apple Remote Desktop, to manage compute nodes from a remote system. The following instructions are for VPN configuration for Mac OS X v10.5 clients.
5 Click Servers, then click the Add (+) button (below the Servers list). 6 Verify that the new entry in the Type column is listed as “KDC.” 7 Enter the private DNS name for your controller in the Server column (for example, controller.cluster). 8 Click Domain, then click the Add (+) button (below the Domain list). 9 Enter the private DNS zone preceded by a period (for example, .cluster). 10 Click the Add (+) button (below the Domain list). 11 Enter the private DNS zone (for example, cluster). 12 Click OK.
12 Testing Your Cluster 12 Use this chapter to make sure you’ve successfully configured your cluster before performing HPC. Use Xgrid Admin to verify that you can see the Xgrid agents in your cluster. Then use sample Xgrid tasks to test your cluster. Checking Your Cluster Using Xgrid Admin Use Xgrid Admin to verify that Xgrid agents are running on the compute nodes.
8 Verify that you can see a list of all nodes in your cluster. If you don’t see all agents you were expecting, see “If Your Agents Can’t Connect to the Xgrid Controller” on page 51. 9 Monitor the progress of Xgrid jobs as they are being processed by clicking Jobs. 10 Quit Xgrid Admin. Testing Your Xgrid Cluster To test your cluster, use GridSample, a sample Cocoa application that comes with Developer Tools for Mac OS X v10.5, to submit Xgrid tasks to the controller.
13 For argument 2, enter “2007.” Note: Instead of specifying one year, you could specify a range of years, and Xgrid would create a separate set of tasks for each year. 14 Click Submit. The Xgrid controller on the controller prepares the tasks and sends them to Xgrid agents running on the cluster nodes. When the job is done, the status of the job changes to Finished in the Xgrid Feeder Sample window. 15 To see the results of each task, click Show Results.
Verifying Your SSH Connection Verify that SSH is running on the controller by using Terminal. To verify your SSH connection: 1 From a remote system, open Terminal (located in /Applications/Utilities/). 2 Open an SSH connection to your controller by logging in with a user account name and password created in Workgroup Manager and by using the public IP address or public DNS name for your controller (for example, ssh tomclark@10.0.2.199).
Cluster Setup Checklist A Appendix A Use the checklist in this appendix to guide you through the cluster setup procedure. Print this checklist and use it to make sure you have performed all setup steps. The steps in this checklist are in order only within each section.
For information about this step Go to Management Computer Setup (Optional) N Disable AirPort and other public network connections “(Optional) Setting Up the Management Computer” on page 78 N Install the latest version of Mac OS X Server tools “(Optional) Setting Up the Management Computer” on page 78 N Install Apple Remote Desktop “(Optional) Setting Up the Management Computer” on page 78 Controller Setup N Connect the controller to the public and private network N Run Server Assistant and con
For information about this step Go to N Verify Xgrid configuration “Verifying Your Xgrid Configuration” on page 105 N Verify your SSH connection “Verifying Your SSH Connection” on page 106 Appendix A Cluster Setup Checklist 109
Appendix A Cluster Setup Checklist
Automating Compute Node Configuration B Appendix B Use this appendix to learn about alternative ways of completing tasks documented earlier in this guide. For large clusters, some tasks in this guide can be completed quickly and efficiently using Apple Remote Desktop. Naming Multiple Cluster Nodes Using the Send UNIX Command in Apple Remote Desktop, you can rename all cluster nodes at once.
8 Close the Send UNIX Command results window. All nodes should now show their hostname in the Remote Desktop list. Joining Multiple Cluster Nodes to the Kerberos Realm To send commands to join the nodes to the Kerberos realm, use Apple Remote Desktop’s Send UNIX Command. To join multiple cluster nodes to the Kerberos realm: 1 Open Apple Remote Desktop. 2 Select the nodes you want to join. 3 From the Manage pop-up menu, choose Send UNIX Command.
5 In the text field, enter the following commands: serveradmin settings xgrid:XgridKerberosInfo:ReadyForAgentRoleBasedSetup = yes serveradmin settings xgrid:XgridKerberosInfo:ReadyForControllerRoleBasedSetup = yes serveradmin settings xgrid:AgentSettings:Enabled = yes serveradmin settings xgrid:AgentSettings:ControllerPassword = "" serveradmin settings xgrid:AgentSettings:prefs:ControllerName = "controller" serveradmin settings xgrid:AgentSettings:prefs:SuspendWhenNotIdle = no serveradmin settings xgrid:Age
Using SSH Without Passwords Users on your cluster can generate authentication keys in their home folders that enable them to use SSH to connect to other cluster nodes without entering their password again. To use SSH without passwords: 1 Make an SSH connection to the controller. If connecting from a remote system, access the public IP address or DNS name of the controller (For example, ssh mab@10.0.2.199). 2 In your home directory on the controller, enter the following commands in sequence: mkdir .
Glossary Glossary address A number or other identifier that uniquely identifies a computer on a network, a block of data stored on a disk, or a location in a computer’s memory. See also IP address, MAC address. administrator A user with server or directory domain administration privileges. Administrators are always members of the predefined “admin” group. AFP Apple Filing Protocol. A client/server protocol used by Apple file service to share files and network services.
bit rate The speed at which bits are transmitted over a network, usually expressed in bits per second. byte A basic unit of measure for data, equal to eight bits (or binary digits). client A computer (or a user of the computer) that requests data or services from another computer, or server. cluster A collection of computers interconnected in order to improve reliability, availability, and performance. Clustered computers often run special software to coordinate the computers’ activities.
DNS name A unique name of a computer used in the Domain Name System to translate IP addresses and names. Also called a domain name. domain Part of the domain name of a computer on the Internet. It does not include the top-level domain designator (for example, .com, .net, .us, .uk). Domain name “www.example.com” consists of the subdomain or host name “www,” the domain “example,” and the top-level domain “com.” domain name See DNS name. Domain Name System See DNS.
Internet Protocol See IP. IP Internet Protocol. Also known as IPv4. A method used with Transmission Control Protocol (TCP) to send data between computers over a local network or the Internet. IP delivers data packets and TCP keeps track of data packets. IP address A unique numeric address that identifies a computer on the Internet. KB Kilobyte. 1,024 (210) bytes. kilobyte See KB. link An active physical connection (electrical or optical) between two nodes on a network.
megabyte See MB. name server A server on a network that keeps a list of names and the IP addresses associated with each name. See also DNS, WINS. Network File System See NFS. network interface Your computer’s hardware connection to a network. This includes (but isn’t limited to) Ethernet connections, AirPort cards, and FireWire connections. network interface card See NIC. NFS Network File System.
RAID Redundant Array of Independent (or Inexpensive) Disks. A grouping of multiple physical hard disks into a disk array, which either provides high-speed access to stored data, mirrors the data so that it can be rebuilt in case of disk failure, or both. The RAID array is presented to the storage system as a single logical storage unit. See also RAID array, RAID level. RAID 1 A RAID scheme that creates a pair of mirrored drives with identical copies of the same data.
Index Index A B access administrator permissions 36 LDAP 86, 98 managing client 35 accounts 94 ACLs (access control lists) 35 administrator 36, 42 agents adding 43 authentication 26 controllers 23, 30, 32, 91 deleting 44 distributed grids 21 functions of 22 grid workload 19 list of 43 management of 42, 43 mobility of 39 overview 23 requirements 18 setup 32, 33, 42, 43, 112 troubleshooting 51 airflow for hardware 77 Apple Remote Desktop (ARD) agent settings 112 clusters 72 features 42 Apple Workgroup Clu
Xgrid 42, 48, 104 computational grids.
restarting 45 results 49 status checking 49 stopping 44 structuring 47 styles 47 submitting 48 K Kerberos cluster setup 86, 112 joining remote clients 101 verifying remote client access 102 Xgrid administration 26, 27, 34, 54 L LDAP (Lightweight Directory Access Protocol) service 86, 98 libraries, code 61 local grids 21 login, SSH 42 logs 37 loosely coupled computations 62 M Mac OS X agent setup 33, 42 Mac OS X Server agent setupauthentication options 32 high performance computing 59, 60 software require
rendering images 62, 105 requirements cluster 67, 68, 73 hardware 67, 68, 69, 70 software 72 Xgrid administration 18, 24 research-related grid projects 18, 19 S SACLs (service access control lists) 35 scp tool 42 search base, LDAP 86 secure SHell.