Administrator's Guide Release 5.0.
ParaStation5 Administrator's Guide ParaStation5 Administrator's Guide Release 5.0.5 Copyright © 2002-2010 ParTec Cluster Competence Center GmbH April 2010 Printed 7 April 2010, 14:11 Reproduction in any manner whatsoever without the written permission of ParTec Cluster Competence Center GmbH is strictly forbidden. All rights reserved. ParTec and ParaStation are registered trademarks of ParTec Cluster Competence Center GmbH.
Table of Contents 1. Introduction ................................................................................................................................. 1 1.1. What is ParaStation ......................................................................................................... 1 1.2. The history of ParaStation ................................................................................................ 1 1.3. About this document .........................................................
ParaStation5 Administrator's Guide 6.2. Problem: node shown as "down" .................................................................................... 6.3. Problem: cannot start parallel task ................................................................................. 6.4. Problem: bad performance ............................................................................................ 6.5. Problem: different groups of nodes are seen as up or down .............................................
Chapter 1. Introduction 1.1. What is ParaStation ParaStation is an integrated cluster management and communication solution. It combines unique features only found in ParaStation with common techniques, widely used in high performance computing, to deliver an integrated, easy to use and reliable compute cluster environment. The version 5 of ParaStation supports various communication technologies as interconnect network.
About this document In the middle of 2004, all rights on ParaStation where transferred from ParTec AG to the ParTec Cluster Competence Center GmbH. This new company takes a much more service-oriented approach to the customer. The main goal is to deliver integrated and complete software stacks for LINUX-based compute clusters by selecting state-of-the-art software components and driving software development efforts in areas where real added value can be provided.
Chapter 2. Technical overview Within this section, a brief technical overview of ParaStation5 will be given. The various software modules constituting ParaStation5 are explained. 2.1. Runtime daemon In order to enable ParaStation5 on a cluster, the ParaStation daemon psid(8) has to be installed on each cluster node. This daemon process implements various functions: • Install and configure local communication devices and protocols, e.g.
License • p4sock.o: this module implements the kernel based ParaStation5 communication protocol. • e1000_glue.o, bcm5700_glue.o: these modules enable even more efficient communication to the network drivers coming with ParaStation5 (see below). • p4tcp.o: this module provides a feature called "TCP bypass". Thus, applications using standard TCP communication channels on top of Ethernet are able to use the optimized ParaStation5 protocol and therefore achieve improved performance.
Chapter 3. Installation This chapter describes the installation of ParaStation5. At first, the prerequisites to use ParaStation5 are discussed. Next, the directory structure of all installed components is explained. Finally, the installation using RPM packages is described in detail. Of course, the less automated the chosen way of installation is, the more possibilities of customization within the installation process occur.
Software Software ParaStation requires a RPM-based Linux installation, as the ParaStation software is based on installable RPM packages. All current distributions from Novell and Red Hat are supported, like • SuSE Linux Enterprise Server (SLES) 9 and 10 • SuSE Professional 9.1, 9.2, 9.3 and 10.0, OpenSuSE 10.1, 10.2, 10.3 • Red Hat Enterprise Linux (RHEL) 3, 4 and 5 • Fedora Core, up to version 7 For other distributions and non-RPM based installations, please contact .
Installation via RPM packages man contains the manual pages describing the ParaStation daemons, utilities and configuration files after installing the documentation package. The necessary steps are described in Section 3.4, “Installing the documentation”. In order to enable the users to access these pages using the man(1) command, please consult the 2 corresponding documentation .
Compiling the ParaStation5 packages from source Please note that the individual version numbers of the distinct packages building the ParaStation5 system do not necessarily have to match. Compiling the ParaStation5 packages from source To build proper RPM packages suitable for a particular setup, the source code for the ParaStation packages 3 can be downloaded from www.parastation.com/download .
Installing the documentation # rpm -Uv psmgmt.5.0.0-0.i586.rpm pscom.5.0.0-0.i586.rpm \ pscom-modules.5.0.0-0.i586.rpm This will copy all the necessary files to /opt/parastation and the kernel modules to /lib/modules/ kernelversion/kernel/drivers/net/ps4. On a frontend node or file server, the pscom-modules package is only required, if this node should run processes of a parallel task.
Installing MPI # rpm -Uv psdoc-5.0.0-1.noarch.rpm All the PDF and HTML files will be installed within the directory /opt/parastation/doc, the manual pages will reside in /opt/parastation/man. The intended starting point to browse the HTML version of the documentation is file:///opt/ parastation/doc/html/index.html. The documentation is available in two PDF files called adminguide.pdf for the ParaStation5 Administrator's Guide and userguide.pdf for the ParaStation5 User's Guide.
Uninstalling ParaStation5 • testing These steps will be discussed in Chapter 4, Configuration. 3.7. Uninstalling ParaStation5 After stoping the ParaStation daemons, the corresponding packets can be removed using # /etc/init.d/parastation stop # rpm -e psmgmt pscom psdoc psmpi2 on all nodes of the cluster.
12 ParaStation5 Administrator's Guide
Chapter 4. Configuration After installing the ParaStation software successfully, only few modifications to the configuration file parastation.conf(5) have to be made in order to enable ParaStation on the local cluster. 4.1. Configuration of the ParaStation system Within this section the basic configuration procedure to enable ParaStation will be described. It covers the configuration of ParaStation5 using TCP/IP (Ethernet) and the optimized ParaStation5 protocol p4sock.
Enable optimized network drivers The values that might be assigned to the HWType parameter have to be defined within the parastation.conf configuration file. Have a brief look at the various Hardware sections of this file in order to find out which hardware types are actually defined. Other possible types are: mvapi, openib, gm, ipath, elan, dapl. To enable shared memory communication used within SMP nodes, no dedicated hardware entry is required. Shared memory support is always enabled by default.
Testing the installation transfer application data across Ethernet, this adapted drivers should be used, too. To enable these drivers, the simplest way is to rename the original modules and recreate the module dependencies: # # # # cd /lib/modules/$(uname -r)/kernel/drivers/net mv e1000/e1000.o e1000/e1000-orig.o mv bcm/bcm5700.o bcm/bcm5700-orig.o depmod -a If your system uses the e1000 driver, a subsequent modinfo command for kernel version 2.
Testing the installation Alternatively, it is possible to use the single command form of the psiadmin command: # /opt/parastation/bin/psiadmin -s -c "list" The command should be repeated until all nodes are up. The ParaStation administration tool is described in detail in the corresponding manual page psiadmin(1). If some nodes are still marked as "down", the logfile /var/log/messages for this node should be inspected. Entries like “psid: ....” at the end of the file may report problems or errors.
Chapter 5. Insight ParaStation5 This chapter provides more technical details and background information about ParaStation5. 5.1. ParaStation5 pscom communication library The ParaStation communication library libpscom offers secure and reliable end-to-end connectivity. It hides the actual transport and communication characteristics from the application and higher level libraries. The libpscom library supports a wide range of interconnects and protocols for data transfers.
Directory /proc/sys/ps4/state The p4sock.ko module inserts a number of entries within the /proc filesystem. All ParaStation5 entries are located within the subdirectory /proc/sys/ps4. Three different subdirectories, listed below, are available. To read a value, e.g. just type # cat /proc/sys/ps4/state/connections to get the number of currently open connections. To modify a value, for e.g. type # echo 10 > /proc/sys/ps4/state/ResendTimeout to set the new value for ResendTimeout. 5.2.1.
Directory /proc/sys/ps4/local • MaxAcksPending: maximum number of pending ACK messages until an "urgent" ACK messages will be sent. • MaxDevSendQSize: maximum number of entries of the (protocol internal) send queue to the network device. • MaxMTU: maximum packet size used for network packets. For sending packets, the minimum of MaxMTU and service specific MTU will be used. • MaxRecvQSize: size of the protocol internal receive queue. • MaxResend: Number of retries until a connection is declared as dead.
Using the ParaStation5 queuing facility a predefined node list. If not defined, all currently known nodes are taken into account. Also, the variables PSI_NODES_SORT, PSI_LOOP_NODES_FIRST, PSI_EXCLUSIVE and PSI_OVERBOOK are observed. Based on these variables and the list of currently active processes, a sorted list of nodes is constructed, defining the final node list for this new task. Beside this environment variables, node reservations for users and groups are also observed. See psiadmin(1).
ParaStation5 TCP bypass In order to run applications linked with one of those MPI libraries, ParaStation5 provides dedicated mpirun commands. The processes for those type of parallel tasks are spawned obeying all restrictions described in Section 5.3, “Controlling process placement”. Of course, the data transfer will be based on the communication channels supported by the particular MPI library. For MPIch using ch_p4 (TCP), ParaStation5 provides an alternative, see Section 5.7, “ParaStation5 TCP bypass”.
Authentication within ParaStation5 PSP_SHM or PSP_SHAREDMEM Don't use shared memory for communication within the same node. PSP_P4S or PSP_P4SOCK Don't use ParaStation p4sock protocol for communication. PSP_MVAPI Don't use Mellanox InfiniBand vapi for communication. PSP_OPENIB Don't use OpenIB InfiniBand vapi for communication. PSP_GM Don't use GM (Myrinet) for communication. PSP_DAPL Don't use DAPL for communication.
Homogeneous user ID space etc/passwd. Usage of common authentication schemes like NIS is not required and therefore limits user management to the frontend nodes. Authentication of users is restricted to login or frontend nodes and is outside of the scope of ParaStation. 5.10. Homogeneous user ID space As explained in the previous section, ParaStation uses only user and group IDs for starting up remote processes. Therefore, all processes will have identical user and group IDs on all nodes.
Integration with AFS 5.14. Integration with AFS To run parallel tasks spawned by ParaStation on clusters using AFS, ParaStation provides the scripts env2tok and tok2env. On the frontend side, calling . tok2env will create an environment variable AFS_TOKEN containing an encoded access token for AFS. This variable must be added to the list of exported variables PSI_EXPORTS="AFS_TOKEN,$PSI_EXPORTS" In addition, the variable PSI_RARG_PRE_0=/some/path/env2tok must be set.
Integration with PBS PRO If an external queuing system is used, the environment variable PSI_NODES_SORT should be set to "none", thus no sorting of any predefined node list will be done by ParaStation. ParaStation includes its own queuing facility. For more details, refer to Section 5.4, “Using the ParaStation5 queuing facility” and ParaStation5 User's Guide. 5.15.1. Integration with PBS PRO Parallel jobs started by PBS PRO using the ParaStation mpirun command will be automatically recognized.
Copying files in parallel # UseMCast statement. If Multicast is enabled, the ParaStation daemons exchange status information using multicast messages. Thus, a Linux kernel supporting multicast on all nodes of the cluster is required. This is usually no problem, since all standard kernels from all common distribution are compiled with multicast support.
Using ParaStation process pinning To list, sort and filter all the collected information, the command psaccview is available. See psaccounter(8) and psaccview(8) for details. 5.19. Using ParaStation process pinning ParaStation is able to pin down compute tasks to particular cores. This will avoid 'hoping' processes between different cores or CPUs during runtime, controlled by the OS scheduler.
Changing the default ports for psid(8) and change the default port number 888. Modify the entry port = 888 within the file /etc/xinet.d/psidstarter to reflect the newly assigned port numbers. In addition, the ParaStation daemon psid(8) uses the UDP port 886 for RDP connections. To change this port, use the RDPPort directive within parastation.conf. See parastation.conf(5) for details.
Chapter 6. Troubleshooting This chapter provides some hints to common problems seen while installing or using ParaStation5. Of course, more help will be provided by . 6.1. Problem: psiadmin returns error When starting up the ParaStation admin command psiadmin, an error is reported: # psiadmin PSC: PSC_startDaemon: connect() fails: Connection refused Reason: the local ParaStation daemon could not be contacted. Verify that the psid(8) daemon is up and running.
Problem: cannot start parallel task Or logged on to this node, run psiadmin which also starts up the ParaStation daemon psid. See Section 6.1, “ Problem: psiadmin returns error ” for more details. Check the logfile /var/log/messages on this node for error messages. Verify that all nodes have an identical configuration (/etc/parastation.conf). 6.3.
Warning issued on task startup This typically happens, if the frontend or head node is included as compute node and also acts as gateway for the compute nodes. The "external" address of the frontend is not known to the compute nodes. Use the PSP_NETWORK environment variable to re-direct all traffic to the cluster-internal network. See ps_environment(5) and Section 5.8, “Controlling ParaStation5 communication paths” for details. 6.7.
Problem: processes cannot access files on remote nodes Make sure no other process uses this port. Or use the RDPPort directive within parastation.conf to re-define this port for all daemons within the cluster. See also parastation.conf(5). 6.10. Problem: processes cannot access files on remote nodes Problem: processes created by ParaStation on remote nodes are not able to access files, if this files have enabled access only for a supplementary group the current user belongs to.
Reference Pages This appendix lists all reference pages related to ParaStation5 administration tasks. For reference pages describing user related commands and information, refer to the ParaStation5 User's Guide.
34 ParaStation5 Administrator's Guide
parastation.conf parastation.conf — the ParaStation configuration file Description Upon execution, the ParaStation daemon psid(8) reads its configuration information from a configuration file which, by default, is /etc/parastation.conf. There are various parameters that can be modified persistently within this configuration file. The main syntax of the configuration file is one parameter per line. Due to ease of use there are some parameters, e.g. Nodes, that are implemented in an environment mode.
The following five types of parameters within the Hardware environment will get a special handling from the ParaStation daemon psid(8). These define different script files called in order to execute various operations towards the corresponding communication hardware. All these entries have the form of the parameter's name followed by the corresponding value. The value might be enclosed by single or double quotes in order to allow a space within. The values are interpreted as absolute or relative paths.
p4sock Use optimized communication via (Gigabit) Ethernet. The script handling this hardware type ps_p4sock is also located in the config subdirectory. It understands the following two environment variables: PS_TCP If set to an address range, e.g. 192.168.10.0-192.168.10.128, the TCP bypass feature of the p4sock protocol is enabled for the given address range. openib Use the OpenFabrics verbs layer for communication over InfiniBand.
accounter This is actually a pseudo communication layer. It is only used for configuring nodes running the ParaStation accounting daemon and should be used only in a particular Nodes entry. NrOfNodes num Define the number of connected nodes including the frontend node. The nodes will be numbered 0 … num-1. There is no default value for NrOfNodes. NrOfNodes has to be declared within the configuration file in any case. The number of connected nodes has to be declared before any Nodes.
Node[s] hostname id [HWType-entry] [starter-entry] [runJobs-entry] [env name value] [env { name value ... }] Node[s] { {hostname id [HWType-entry] [starter-entry] [runJobs-entry] [env name value] [env { name value ... }] }... } Node[s] $GENERATE from-to/step nodestr idstr [HWType-entry] [starter-entry] [runJobs-entry] [env name value] [env { name value ... }] Define one or more nodes to be part of the ParaStation cluster. This is the first example of a parameter that supports the environment mode.
SelectTime time Set the timeout of the central select(2) of the ParaStation daemon psid(8) to time seconds. The default value is 2 seconds. This parameter can be set during runtime via the set selecttime directive within the ParaStation administration and management tool psiadmin(1). DeadInterval num The ParaStation daemon psid(8) will declare other daemons as dead after num consecutively missing multicast pings. After declaring a node as dead, all processes residing on this node are also declared dead.
The default port to use is 886. RLimit { Core size | CPUTime time | DataSize size | MemLock size | StackSize size | RSSize size } RLimit { { Core size | CPUTime time | DataSize size | MemLock size | StackSize size | RSSize size }... } Set various resource limits to the psid(8) and thus to all processes started from it. All limits are set using the setrlimit(2) system call. For a detailed description of the different types of limits please refer to the corresponding manual page.
The value part of each line either is a single word or an expression enclosed by single or double quotes. The expression might contain whitespace characters. If the expression is enclosed by single quotes, it is allowed to use balanced or unbalanced double quotes within this expression and vice versa. This command might be used for example in order to set the PSP_NETWORK environment variable globally without the need of every user to adjust this parameter in his own environment.
This only comes into play, if the user does not define a sorting strategy explicitely via PSI_NODES_SORT. Be aware of the fact that using a batch-system like PBS or LSF *will* set the strategy explicitely, namely to NONE. overbook { true | yes | 1 | false | no | 0 } If the argument is one of yes, true or 1, all nodes may be overbooked by the user using the PSI_OVERBOOK environment variable. If the argument is one of no, false or 0, ParaStation will deny overbooking of the nodes, even if PSI_OVERBOOK is set.
rdpMaxRetrans number Set the maximum number of retransmissions within the RDP facility. If more than this number of retransmission would have been necessary to deliver the packet to the remote destination, this connection is declared to be down. See also psiadmin(1). statusBroadcasts number Set the maximum number of status broadcasts per round. This is used to limit the number of statusbroadcasts per status-iteration.
ACK is sent piggyback within the next regular packet to this node or as soon as a retransmission occurred. If set to 1, each RDP packet received is acknowledged by an explicit ACK. Errors No known errors.
46 ParaStation5 Administrator's Guide
psiadmin psiadmin — the ParaStation administration and management tool Synopsis psiadmin [ -denqrsv? ] [ -c command ] [ -f program-file ] [ --usage ] Description The psiadmin command provides an administrator interface to the ParaStation system. The command reads directives from standard input in interactive mode. The syntax of each directive is checked and the appropriate request is sent to the local ParaStation daemon psid(8). In order to send psiadmin into batch mode, either use the -c or the -f.
--usage Display a brief usage message. Standard Input The psiadmin command reads standard input for directives until end of file is reached, or the exit or quit directive is read. Standard Output If Standard Output is connected to a terminal, a command prompt will be written to standard output when psiadmin is ready to read a directive. If the -e option is specified, psiadmin will echo the directives read from standard input to standard output.
If nodes is empty, the node range preselected via the range command is used. The default preselected node range contains all nodes of the ParaStation cluster. The from and to parts of each range are node IDs. They might be given in decimal or hexadecimal notation and must be in the range between 0 and NumberOfNodes-1. As an extension nodes might also be a hostname that can be resolved into a valid ParaStation ID. Using hostnames containing "-" might confuse this algorithm and is therefore not recommended.
count [hw hw] List the status of the communication system(s) on the selected node(s). Various counters are displayed. If the hw option is given, only the counters concerning the hw hardware type are displayed. The default is to display the counters of all enabled hardware types on this node. down List all nodes which are marked as "DOWN". hardware Show the hardware setup on the selected node(s).
TaskID The ParaStation task ID of the process, both as decimal and hexadecimal number. The task ID of a process is unique within the cluster and is composed out of the ParaStation ID of the node the process is running on and the local process ID of the process, i.e. the result of calling getpid(2). ParentTaskID The ParaStation task ID of the parent process. The parent process is the one which has spawned the current process. If the process was not spawned by any other controlled by ParaStation, i.e.
range {[nodes] | all } Preselect or display the default set of nodes If nodes or all is given, this directive modifies the default set of nodes all following directives will act on. nodes is given in the same syntax as within any other directive, i.e. a comma separated list of node ranges from-to, where a range might be trivial containing only the from part. In this case all further directives are called as if the nodes part or all is appended unless a node set is given explicitely.
master [nodes] Show the current master on the selected node(s). The master node's task is the management and allocation of resources within the cluster. It is elected among the running nodes during runtime. Thus usually all nodes should give the same answer to this question. In rare cases - usually during startup or immediately after a node failure - the nodes might disagree on the elected master node. This command helps on identifying these rare cases.
cpumap [nodes] Show the CPU-slot to core mapping list for the selected nodes. bindmem [nodes] Show flag marking if this nodes uses binding as NUMA policy. adminuser [nodes] Show users allowed to start admin-tasks, i.e. unaccounted tasks. admingroup [nodes] Show groups allowed to start admin-tasks, i.e. unaccounted tasks. rl_addressspace [nodes] Show RLIMIT_AS on this node. rl_core [nodes] Show RLIMIT_CORE on this node. rl_cpu [nodes] Show RLIMIT_CPU on this node.
rl_sigpending [nodes] Show RLIMIT_SIGPENDING on this node. rl_stack [nodes] Show RLIMIT_STACK on this node. supplementaryGroups [nodes] Show supplementaryGroups flag. statusBroadcasts [nodes] Show the maximum number of status broadcasts initiated by lost connections to other daemon. rdpTimeout [nodes] Show the RDP timeout configured in ms. deadLimit [nodes] Show the dead-limit of the RDP status module. See also parastation.conf(5). statusTimeout [nodes] Show the timeout of the RDP status module.
hwstart [hw { hw | all } ] [nodes] Start the declared hardware on the selected nodes. Starting a specific hardware will be tried on the selected nodes regardless, if this hardware is specified for this nodes within the parastation.conf configuration file or not. On the other hand, if hw all is specified or the hw option is missing at all, only the hardware types specified within the configuration file are started.
adminuser [ + | - ] { name | any } [nodes] Grant authorization to start admin-tasks, i.e. task not blocking a dedicated CPU, to a particular or any user. Name might be a user name or a numerical UID. If name is preceeded by a '+' or '-', this user is added to or removed from the list of adminusers respectively. admingroup [ + | - ] { name | any } [nodes] Grant authorization to start admin-tasks, i.e. task not blocking a dedicated CPU, to a particular or any group.
Pattern Name Description 0x0000001 PSC_LOG_PART Partitioning functions (i.e. PSpart_()) 0x0000002 PSC_LOG_TASK Task structure handling (i.e.
Pattern Name Description 0x0001 RDP_LOG_CONN Uncritical errors on connection loss 0x0002 RDP_LOG_INIT Info from initialization (IP, FE, NFTS etc.
nodesSort { PROC | LOAD_1 | LOAD_5 | LOAD_15 | PROC+LOAD | NONE } [nodes] Define the default sorting strategy for nodes when attaching them to a partition.
bindmem [ 0 | 1 ] [nodes] Set flag marking if this nodes will use memory-binding as NUMA policy. Relevant values are 'false', 'true', 'no', 'yes', 0 or different from 0. cpumap map [nodes] Set the map used to assign CPU-slots to physical cores to map. Map is a quoted string containing a space-separated permutation of the number 0 to Ncore-1. Here Ncore is the number of physical cores available on this node. The number of cores within a distinct node may be determined via 'list hw'.
quiet Quiet execution. Only a short message is printed if the test was successful. normal Normal execution with some messages during runtime. This is the default. verbose Very verbose execution with many message during runtime. Files Upon startup, psiadmin tries to find .psiadminrc in the current directory or in the user's home directory. The first file found is parsed and the directives within are executed. Afterwards psiadmin goes into interactive mode unless the -f is used.
psid psid — the ParaStation daemon. The organizer of the ParaStation software architecture. Synopsis psid [-v?] [-d level] [-f configfile] [-l logfile] [--usage] Description The ParaStation daemon is implemented as a Unix daemon process. It supervises allocated resources, cleans up after application shutdowns, and controls access to common resources. Thus, it takes care of tasks which are usually managed by the operating system. The local daemon is usually started by executing psiadmin(1).
Options -d , --debug=level Activate the debugging mode and set the debugging level to level. If debugging is enabled, i.e. if level is larger than 0 and option -l is set to stdout, no fork(2) is made on startup, which is usually done in order to run psid as a daemon process in background. The debugging level of the daemon can also be modified during runtime using the set psiddebug command of psiadmin(1).
test_config test_config — verify the ParaStation4 configuration file. Synopsis test_config [-vad? ] [-v ] [-a ] [-d ] [-? ] [-f filename] Description test_config reads and analyses the ParaStation4 configuration file. Any errors or anomalies are reported. By default, the configuration file /etc/parastation.conf will be used. Options -f filename Use configuration file filename. -d num Set debug level to num. -v Output version information and exit. -h -? , --usage Show a help message.
66 ParaStation5 Administrator's Guide
test_nodes test_nodes — test physical connections within a cluster. Synopsis test_nodes [-np num] [-cnt count] [-map] [-type] Description Tests all or some physical (low level) connections within a cluster. Therefore the program is started on num nodes. After all processes came up correctly, each of them starts to send test packets to every other node of the cluster. For this purpose the PSP_IReceive(3) and PSP_ISend(3) calls of the ParaStation PSPort library are used.
68 ParaStation5 Administrator's Guide
test_pse test_pse — test virtual connections within a cluster. Synopsis test_pse [-np num] Description This command spawns num processes within the cluster. It's intended to test the process spawning capabilities of ParaStation. It does not test any communication facilities within ParaStation. Options -np num Spawn num processes.
70 ParaStation5 Administrator's Guide
p4stat p4stat — display information about the p4sock protocol. Synopsis p4stat [ -v ] [ -s ] [ -n ] [ -? ] [ --sock ] [ --net ] [ --version ] [ --help ] [ --usage ] Description Display information for sockets and network connections using the ParaStation4 protocol p4sock. Options -s, --sock Display information about open p4sock sockets. -n, --net Display information of network connections using p4sock. -v, --version Output version information and exit. -?, --help Show a help message.
72 ParaStation5 Administrator's Guide
p4tcp p4tcp — configure the ParaStation4 TCP bypass. Synopsis p4tcp [ -v ] [ -a ] [ -d ] [ -? ] [ from [ to ]] Description p4tcp configures the ParaStation4 TCP bypass. Without an argument, the current configuration is printed. From and to are IP addresses forming an address range for which the bypass feature should be activated. Multiple addresses or address ranges may be configured by using multiple p4tcp commands. To enable the bypass for a pair of processes, the library libp4tcp.
74 ParaStation5 Administrator's Guide
psaccounter psaccounter — Write accounting information from the ParaStation psid to the accounting files. Synopsis psaccounter [ -e | --extend ] [ -d | --debug=pattern ] [ -F | --foreground ] [ -l | --logdir=dir ] [ -f | -logfile=filename ] [ -p | --logpro ] [ -c | --dumpcore ] [ --coredir=dir ] [ -v | --version ] [ -? | --help ] [--usage] Description The command psaccounter collects information about jobs from the ParaStation psid daemon and writes this information to the accounting files.
Calling psaccounter with -p gzip would call the command gzip yyyymmdd and therefore compress least recently used accounting file. -c, --dumpcore Define that a core file should be written in case of a catastrophy. By default, the core file will be written to /tmp. --coredir=dir Defines where to save core files. -v, --version Output version information and exit. -?, --help Show this help messages. --usage Display brief usage message. Files /var/account/yyyymmdd Accounting files, one per day.
psaccview psaccview — Print ParaStation accounting information.
Grouping jobs -lj, --ljobs Print detailed jobs list. Lists all jobs, one per line. -lu, --ltotuser Print user list. Lists job summary per user, one user per line. -lg, --ltotgroup Print group list. Lists job summary per group, one group per line. -ls, --ltotsum Print total job summary. Lists a summary of all jobs, only one line in total. Defining time periods considered -t, --timespan=period Selects a period of time shown. Valid entries are today, week, month or all.
Upon startup psaccview tries to find the file .psaccviewrc in the user's home directory. Within this file, pre-defined variables in the command my be re-defined. See the configuration section within the psaccview script. The command expects one file per day, named as yyyymmdd, where yyyy represents the year, mm the month and dd the day for the data contained.
These column names may also be used for sorting lists, where applicable. Files /var/account/* , /var/account/*.gz , /var/account/*.bz2 Accounting files, one per day. $HOME/.psaccviewrc Initialization file. See also psaccounter(8).
mlisten mlisten — display multicast pings from the ParaStation daemon psid(8) Synopsis mlisten [-dv?] [-m MCAST] [-p PORT] [-n IP] [-# NODES] [--usage] Description Display the multicast pings the ParaStation daemon psid(8) is emitting continuously. These pings are displayed by spinning bars. Each ping received from node N lets the Nth bar spin around one more step. For each node never received a multicast ping from a '.' is displayed. ParaStation by default no longer uses multicast messages.
82 ParaStation5 Administrator's Guide
Appendix A. Quick Installation Guide This appendix gives a brief overview how to install ParaStation5 on a cluster. A detailed description can be found in Chapter 3, Installation and Chapter 4, Configuration. 1. Shutdown If this is an update of ParaStation, first shut down the ParaStation system. In order to do this, startup psiadmin and issue a shutdown command. # /opt/parastation/bin/psiadmin psiadmin> shutdown This will terminate all currently running tasks controlled by ParaStation, including psiadmin.
Provided the ParaStation daemon is started by the xinetd, run the psiadmin(1) command located in / opt/parastation/bin and execute the add command. This will bring up the ParaStation daemon psid(8) on every node. # /opt/parastation/bin/psiadmin psiadmin> add Alternatively you can start psiadmin(1) with the -s option. To install the ParaStation daemon as a system service, started up at boot time, use # chk_config -a /etc/init.d/parastation This step must be repeated for each node. 7.
Appendix B. ParaStation license The ParaStation software may be used under the following terms and conditions only. Software and Know-how License Agreement Version 1.0 between ParTec Cluster Competence Center GmbH place of business: Possartstr. 20, 81679 München represented by: Bernhard Frohwitter - in the following referred to as ParTec and you - in the following referred to as "Licensee" Preamble ParTec has developed a cluster middleware software, comprising a high-performance communication layer.
Commercial Use means any non-consumer use that is not covered by University Use. Know-how means program documents and information which relates to Software, also in machine readable form, in particular the Base Version Code and the detailed comments on the Base Version Code, provided together with the Base Version Code.
§ 6 Grant-Back 1. Licensee grants ParTec for Modifications being severable improvements a nonexclusive, perpetual, irrevocable, worldwide and royalty-free license, and for Modifications being non-severable improvements an exclusive, perpetual, irrevocable, worldwide and royalty-free license to a.
2. A breach by Licensee of any one of the obligations under sections §4, §5 and §6, will automatically terminate Licensee's rights under this license. § 12 Rights after Expiration of the Agreement 1. All rights of Licensee on the use of the Base Version Code end at the expiration or termination of this agreement. 2.
Appendix C. Upgrading ParaStation4 to ParaStation5 This appendix explains how to upgrade an existing ParaStation4 installation to the current ParaStation5 version. C.1. Building and installing ParaStation5 packages Just recompile the packages: # # # # # # # rpmbuild --rebuild psmgmt.5.0.0-0.src.rpm rpm -U psmgmt.5.0.0-0.i586.rpm rpmbuild --rebuild pscom.5.0.0-0.src.rpm rpm -U pscom.5.0.0-0.i586.rpm rpm -U pscom-modules.5.0.0-0.i586.rpm rpmbuild --rebuild psmpi2.5.0.0-1.src.rpm rpm -U psmpi2.5.0.0-1.i586.
Changes to the runtime environment Use the mpiexec command instead! Executables linked with ParaStation4 can be run using the new mpiexec command. In this case, the option -b or --bnr is required. The environment variable PSP_P4SOCK was renamed to PSP_P4S, but still recognized. Within this version of ParaStation, both names may be used. Likewise, The environment variable PSP_SHAREDMEM was renamed to PSP_SHM, but also still recognized.
Glossary Address Resolution Protocol A sending host decides, through a protocols routing mechanism, that it wants to transmit to a target host located some place on a connected piece of a physical network. To actually transmit the hardware packet usually a hardware address must be generated. In the case of Ethernet this is 48 bit Ethernet address. The addresses of hosts within a protocol are not always compatible with the corresponding hardware address (being different lengths or values).
to store it to a given address. The rest of the jobs is done by this controller without producing further load to the CPU. Obviously this concept helps to disburden the CPU from work which is not its first task and thus gives more power to solve the actual application. Forwarder See ParaStation Forwarder. Logger See ParaStation Logger. Master Node The evaluation of temporary node lists while spawning new tasks is done only by one particular psid(8) within the cluster.
Serial Task A single process running on one of the compute nodes within the cluster. This process does not communicate with other processes using MPI. ParaStation knows about this process and where it is started from. A serial task may use multiple threads to execute, but all this threads have to share a common address space within a node.
94 ParaStation5 Administrator's Guide