Setting up an HP SIM server 6.0 or greater on a Linux-based Serviceguard Cluster White Paper Overview............................................................................................................................................ 2 HP SIM architecture ............................................................................................................................. 2 Setup process...........................................................................................................
Overview HP Systems Insight Manager (HP SIM) is an industry standard management tool for the management of all HP systems; servers and storage. With HP SIM, you can manage various systems including ProLiant servers running Windows, Linux, and NetWare, HP Integrity and HP 9000 servers running HP-UX; HP Integrity Servers running Windows and Linux; and monitor Alpha servers running Tru64 UNIX and OpenVMS.
hpsmdb-server-8.2.1-1HPSIM Programs needed to create user-defined types and functions hpsmdb-libs-8.2.1-1HPSIM Essential shared libraries hpsmdb-8.2.1-1HPSIM - Client programs and libraries hpsim-pgsql-config-C.05.02.00.00-1 HP Systems Insight Manager Repository Configuration Product Setting up the Serviceguard cluster 1. Install HP SIM. a. Install the HP SIM binary on system 1. b. Install the HP SIM binary on system 2. c. Create all necessary directories on the shared storage system. d.
The database cluster will be initialized with locale en_US.UTF-8. The default database encoding has accordingly been set to UTF8. creating subdirectories ... ok selecting default max_connections ... 100 selecting default shared_buffers/max_fsm_pages ... 24MB/153600 creating configuration files ... ok creating template1 database in /var/opt/hpsmdb/data/base/1 ... ok initializing pg_authid ... ok initializing dependencies ... ok creating system views ... ok loading system objects' descriptions ...
15. Initialization and Database Population ..OK Status : Unconfigured Completed all tasks successfully. This utility should report that all server components are acceptable and that all tasks completed successfully. If the utility reports an issue, see the HP Systems Insight Manager 5 Installation and Configuration Guide for Linux. 7. Initialize and configure HP SIM by executing the following command: # /opt/mx/bin/mxinitconfig -a Checking Requisites (15): 1. Check Kernel Parameters ..OK 2.
9. If the daemons are not running, start them by executing the following command: # /opt/mx/bin/mxstart 10. After installing the HP SIM package perform the following steps to create the relevant directories on the shared storage, copy the contents of the original directory and create symbolic links (/etc/pam.d/mxpamauthrealm, /etc/opt/mx, / opt/mx, /var/opt/mx, /etc/init.d/hpsim, and /var/opt/smdb) respectively. a) Stop the running processes: # /etc/init.d/hpsim stop # /etc/init.
e) For each node, move the original HP SIM content to the new destination and create the symbolic links. f) Execute the following commands in SG1hpsimlnx: # # # # # # # # # # # # # # # # # # # # # # # # # # mv /etc/init.d/hpsim etc/init.d ln -sf /hpsimlnx/etc/init.d/hpsim /etc/init.d mv /etc/opt/mx etc/opt/ ln -sf /hpsimlnx/etc/opt/mx /etc/opt mv /etc/pam.d/hpsmdb etc/pam.d/ ln -sf /hpsimlnx/etc/pam.d/hpsmdb /etc/pam.d/ mv /etc/pam.d/mxpamauthrealm etc/pam.d/ ln -sf /hpsimlnx/etc/pam.
# # # # # ln rm ln rm ln -sf -rf -sf -rf -sf /hpsimlnx/var/log/hpsmdb /var/log /var/opt/hpsmdb /hpsimlnx/var/opt/hpsmdb /var/opt /var/opt/mx /hpsimlnx/var/opt/mx /var/opt Installing the cluster layer Serviceguard 11.18 HP Serviceguard 11.18 is used to provide the HA layer. For the HP SIM installation, run both system installations at the same time. # cat /etc/hosts 127.0.0.1 localhost.localdomain localhost 192.168.5.11 SG1hpsimlnx.hp-demo.net SG1hpsimlnx 192.168.5.12 SG2hpsimlnx.hp-demo.
export SGLBIN=/usr/local/cmcluster/bin export SGLIB=/usr/local/cmcluster/lib export SGRUN=/usr/local/cmcluster/run export SGAUTOSTART=/usr/local/cmcluster/conf/cmcluster.rc export SGROOT=/usr/local/cmcluster EOF # chmod 755 /etc/profile.d/sg.sh # scp –p /etc/profile.d/sg.sh SG2hpsimlnx:/etc/profile.d/sg.sh # . /etc/profile.d/sg.
# vi /opt/hp/hpsmh/tomcat/bin/setclasspath.sh else if [ ! -r "$JAVA_HOME"/bin/java -o ! -r "$JAVA_HOME"/bin/jdb -o ! -r "$JAVA_HOME"/bin/javac ]; then echo "The JAVA_HOME environment variable is not defined correctly" echo "This environment variable is needed to run this program" ### exit 1 fi # rpm -ivh sgmgrpi-B.01.01-1.rhel4.i386.rpm Preparing... ########################################### [100%] 1:sgmgrpi ########################################### [100%] [root@sgnode2 ~]# /etc/init.
# vi $SGCONF/cluster.conf ig #********************************************************************** #********* HIGH AVAILABILITY CLUSTER CONFIGURATION FILE *************** #***** For complete details about cluster parameters and how to ******* #***** set them, consult the ServiceGuard manual. ********************* #********************************************************************** # Enter a name for this cluster. This name will be used to identify the # cluster when viewing or manipulating it.
NETWORK_POLLING_INTERVAL 2000000 # Package Configuration Parameters. # Enter the maximum number of packages which will be configured in the cluster. # You cannot add packages beyond this limit. # This parameter is required. MAX_CONFIGURED_PACKAGES 20 EOF 3. Replicate the node on node SG2hpsimlnx: # scp -p $SGCONF/cluster.config $SGCONF/cmclnodelist SG2hpsimlnx:$SGCONF 4. Apply the configuration: # cmapplyconf -v -C cluster.config Begin cluster verification... Checking cluster file: cluster.
Figure 2. Package creation 3. On the Parameters tab, enter a name for the package and use the default parameters. Figure 3. Package parameters 4. Click the Monitored Resources tab, and then select the subnets.
Figure 4. Monitored resources 5. Edit the control scripts for your package. On the Parameters tab, click Edit Control Script. Figure 5. Control scripts 6. Verify the parameters marked in bold: # @(#) A.11.18.
# * Note: This file MUST be edited before it can be used. * # * * # * You must have bash version 2 installed for this script to work * # * properly. Also required is the arping utility available in the * # * iputils package. * # * * # ********************************************************************** # # # # # # The environment variables PACKAGE, NODE, SG_PACKAGE, SG_NODE and SG_SCRIPT_LOG_FILE are set by Serviceguard at the time the control script is executed.
# be defined within a single package control script. Other package # control scripts may be defined and may define other filesystems. # There are certain default variable settings when used with GFS. These # are defined below. These variables will automatically be set to # the proper values if using GFS.
# # RAID Configuration file must not be set if the underlying file system # is Red Hat GFS. # #RAIDTAB="" # MD (RAID) COMMANDS # Specify the method of activation and deactivation for md. # Leave the default (RAIDSTART="raidstart", "RAIDSTOP="raidstop") if you want # md to be started and stopped with default methods. # RAIDSTART="raidstart -c ${RAIDTAB}" RAIDSTOP="raidstop -c ${RAIDTAB}" # VOLUME GROUP ACTIVATION # Specify the method of activation for volume groups.
# # NOTE: Mixing of 'gfs' with non-gfs filesystems in the same package # control script is not permitted. A single package control # script can define either a 'gfs' filesystem or a non-gfs # filesystem but not both. # # The following section applies if the underlying file system is 'ext2', # 'ext3' or 'reiserfs'. # # The filesystems are defined as entries specifying the logical # volume, the mount point, the file system type, the mount, # umount and fsck options.
# GFS6.0 uses pool for logical volume management whereas GFS6.1 uses LVM2. # Their device name formats differ and an example for each is shown below. # Please use the appropriate one. # Pool : /dev/pool/pool1 (GFS 6.0) OR # LVM2 : /dev/mapper/vgX-lvY (GFS6.1) # mount point : /pkg1a # filesystem type : gfs # mount options : read/write # # Then the following would be entered: # LV[0]=/dev/pool/pool1; (GFS6.0) OR # LV[0]=/dev/mapper/vgX-lvY; (GFS6.
# for the system resources available on your cluster nodes. Some examples # of system resources that can affect the optimum number of concurrent # operations are: number of CPUs, amount of available memory, the kernel # configuration for nfile and nproc. In some cases, if you set the number # of concurrent operations too high, the package may not be able to start # or to halt.
# You could specify IPv4 or IPv6 IP and subnet address pairs. # Uncomment IP[0]="" and SUBNET[0]="" and fill in the name of your first # IP and subnet address. You must begin with IP[0] and SUBNET[0] and # increment the list in sequence. # # For example, if this package uses an IP of 192.10.25.12 and a subnet of # 192.10.25.0 enter: # IP[0]=192.10.25.12 # SUBNET[0]=192.10.25.0 # (netmask=255.255.255.
# external connections (activate package IP addresses). Therefore, at the time # the clients connect to the system, the application server is # ready for service. # # If you set the HA_APP_SERVER to "post-IP", the application will be started # AFTER adding the package IP address(es) to the system. Application servers # such as Apache Web Server will check the existing IP when the server starts. # These applications will not be started if the IP has not been added to the # system.
# You should define all actions you want to happen here, before the service is # started. You can create as many functions as you need. # function customer_defined_run_cmds { # ADD customer defined run commands. : # do nothing instruction, because a function must contain some command. test_return 51 } # This function is a place holder for customer define functions. # You should define all actions you want to happen here, after the service is # halted.
echo -e "\teither "none" or the supported method of remote data replication." echo -e "\tSet this variable to the appropriated value and then restart" echo -e "\tthe package." let 0 test_return 41 fi if [[ $DATA_REP != "none" ]] then if [[ -x ${SGSBIN}/DRCheckDiskStatus ]] then echo "$(date '+%b %e %T') - Node \"$(hostname)\": This package is configured with remote data replication.
# NFS, Apache) or not. If the value of the HA server enable flag (HA_APP_SERVER) # equals to either "pre-IP" or "post-IP" and the Toolkit Interface Script # (toolkit.sh) exists in the package directory then the package will be # configured for use with the HA server, and the interface script will be # invoked as a sub-script. # # This function has one parameter passed to it, which then will be passed to the # toolkit.
result=$(vgchange --$op $tag $vg 2>&1) case $result in *Volume*group*successfully*changed*) echo "$op was successful on vg $vg." ;; *Volume*group*does*not*support*tags*) echo "VG $vg does not support tags." ;; *) echo "vgchange --$op error:" echo $result ;; esac else echo "vg_tag: illegal operation: $op." fi } # This function will check to see if a VG is activated on another node. # It will do this by using LVM2 'tags'.
IFS=$OLDIFS return 1 fi #get the hostname of the node host=$(uname -n) if (( $? != 0 )) ; then printf "activation_check: Error in getting the hostname\n" return 1 fi #check hostid if [[ "$hostid" != "" ]] ; then status="" if [[ $host != $hostid ]] ; then cl_hostid=${hostid%%.
printf "******************* WARNING ***************************\n\n" printf "Forcing activation can lead to data corruption if\n" printf "\"$hostid\" is still running and has \"$vg\"\n" printf "active. It is imperitive to positively determine that\n" printf "\"$hostid\" is not running prior to performing\n" printf "this operation.
numvg=$(vgdisplay | grep -w -e ${vgname} | wc -l) if (( numvg == 0 && ${#MD[*]} > 0 )) then # First let’s do a sanity check to see if the vg really has # a configuration backup. If not then we report this and # exit. It is a prereq that a vgcfgbackup be done after # the vg configuration is built and before sg is started. # Perform a check to see if it is LVM Version 2 or LVM version 1 # and execute the appropriate commands.
local fs_mount_opt vol_to_mount=$1 mount_pt=$2 shift 2 fs_mount_opt=$* echo "WARNING: Running fuser on ${mount_pt} to remove anyone using the busy mount point directly." UM_COUNT=0 RET=1 # The control script exits, if the mount failed after # retrying FS_MOUNT_RETRY_COUNT times.
echo ${LV[@]} | tr ' ' '\012' | sed -e 's/^/ /' # Perform parallel fsck's for better performance. # Limit the number of concurrent fsck to CONCURRENT_FSCK_OPERATIONS R=0 while (( R < ${#LV[*]} )) do j=0 while (( j < CONCURRENT_FSCK_OPERATIONS && R < ${#LV[*]} )) do ( case ${FS_TYPE[$R]} in ext2|ext3) e2fsck ${FS_FSCK_OPT[$R]} -y ${LV[$R]} # on linux fsck will return a 1 if filesystem errors # were corrected. This means that the filesystem was # dirty but is now clean so we should be able to # continue.
let 0 test_return 2 fi (( j = j - 1 )) done done fi # Check exit value (set if any proceeding fsck calls failed) if (( $exit_value == 1 )) then echo "###### Node \"$(hostname)\": Package start FAILED at $(date) ######" exit 1 fi fi typeset -i F=0 typeset -i j typeset -i L=${#LV[*]} while (( F < L )) do j=0 while (( j < CONCURRENT_MOUNT_AND_UMOUNT_OPERATIONS && F < L )) do I=${LV[$F]} if [[ $(mount | grep -e $I) == "" ]] then echo "$(date '+%b %e %T') - Node \"$(hostname)\": Mounting $I at ${FS[$F]}" # Perfo
(( F = F + 1 )) (( j = j + 1 )) done # wait for background mounts to finish while (( j > 0 )) do pid=${pids_list[$j-1]} wait $pid if (( $? != 0 )) then let 0 test_return 3 fi (( j = j - 1 )) done done } # For each {IP address/subnet} pair, add the IP address to the subnet # using cmmodnet(1m).
# `let 0` is used to set the value of $? to 1. The function test_return # requires $? to be set to 1 if it has to print error message. let 0 test_return 4 fi } # For each {service name/service command string} pair, start the # service command string at the service name using cmrunserv(1m).
echo "$(date '+%b %e %T') - Node \"$(hostname)\": Halting service $I" cmhaltserv $I test_return 9 done } # For each IP address/subnet pair, remove the IP address from the subnet # using cmmodnet(1m).
# Limit the number of parallel umounts to # CONCURRENT_MOUNT_AND_UMOUNT_OPERATIONS typeset pids_list while (( L > 0 )) do j=0 while (( j < CONCURRENT_MOUNT_AND_UMOUNT_OPERATIONS && L > 0 )) do (( L = L - 1 )) K=${FS[$L]} I=${LV[$L]} mount | grep -e " "$K" " > /dev/null 2>&1 if (( $? == 0 )) then echo "$(date '+%b %e %T') - Node \"$(hostname)\": Unmounting filesystem on $K" ( result=$(umount ${FS_UMOUNT_OPT[$L]} $K 2>&1) ret=$? if (( ret != 0 )) then case $result in *not*mounted*) (( ret = 0 )) ;; *) echo "W
# wait for background umount processes to finish while (( j > 0 )) do pid=${pids_list[$j-1]} wait $pid if (( $? != 0 )) then let 0 test_return 13 fi (( j = j - 1 )) done done } # This function deactivates volume groups # function deactivate_volume_group { typeset result for I in ${VG[@]} do echo "$(date '+%b %e %T') - Node \"$(hostname)\": Deactivating volume group $I" (( repeat=${#LV[*]}*2 )) while (( repeat > 0 )); do result=$(vgchange --test -a n $I 2>&1) case $result in *Can*t*deactivate*volume*group*w
vg_tag deltag $I $(uname -n) fi done } # This function deactivates mirror disk # function deactivate_md { for I in ${MD[@]} do echo "$(date '+%b %e %T') - Node \"$(hostname)\": Deactivating md $I" $RAIDSTOP $I test_return 26 done } # This function will set variables to the required settings if # GFS is being used.
local to_exit=0 case $1 in 1) echo "ERROR: Function activate_volume_group; Failed to activate $I" deactivate_volume_group deactivate_md verify_physical_data_replication stop to_exit=1 ;; 2) echo "ERROR: Function check_and_mount; Failed to fsck one of the devices.
to_exit=1 ;; 9) echo "WARNING: Function halt_services; Failed to halt service $I" ;; 12) echo "ERROR: Function remove_ip_address; Failed to remove $I" exit_value=1 ;; 13) echo "ERROR: Function umount_fs; Failed to unmount $I" exit_value=1 ;; 14) echo "ERROR: Function deactivate_volume_group; Failed to deactivate $I" exit_value=1 ;; 17) echo "ERROR: Function freeup_busy_mountpoint_and_mount_fs;" echo -e "\tFailed to mount $I to ${FS[$F]}" umount_fs deactivate_volume_group deactivate_md verify_physical_data_r
echo "ERROR: Function verify_physical_data_replication" to_exit=1 ;; 50) echo "ERROR: Function verify_ha_server; Failed to $action HA servers" # hanfs.
echo "###### Node \"$(hostname)\": Package start FAILED at $(date) ######" exit 1 fi fi } # END OF FUNCTIONS COMMON TO BOTH RUN AND HALT #-------------------MAINLINE Control Script Code Starts Here---------------# # FUNCTION STARTUP SECTION. exit_value=0 typeset CUR_VERSION # Check that CONCURRENT_MOUNT_AND_UMOUNT_OPERATIONS is set to >=1. if (( CONCURRENT_MOUNT_AND_UMOUNT_OPERATIONS < 1 )) then echo \ "\tWARNING: Invalid CONCURRENT_MOUNT_AND_UMOUNT_OPERATIONS value. Defaulting it to 1.
verify_ha_server $1 fi add_ip_address if [[ "$HA_APP_SERVER" = "post-IP" ]] then verify_ha_server $1 fi customer_defined_run_cmds start_services # Check exit value if (( $exit_value == 1 )) then echo "###### Node \"$(hostname)\": Package start FAILED at $(date) ######" exit 1 else echo "###### Node \"$(hostname)\": Package start completed at $(date) ######" exit 0 fi elif [[ "$1" = "stop" ]] then echo -e "\n####### Node \"$(hostname)\": Halting package at $(date) #######" check_gfs halt_services customer_de
echo "###### Node \"$(hostname)\": Package halted with ERROR at $(date) ######" exit 1 else echo "###### Node \"$(hostname)\": Package halt completed at $(date) ######" exit 0 fi fi 7. After verifying the script, click Save and Distribute. Figure 6. Edit and save file After creating the package, the following directory displays the files generated by Serviceguard Manager: # ls -ll total 188 -rw-r--r--rwx------rw-r--r-- 1 root root 1109 Jan 30 18:02 hpsim.config 1 root root 50320 Jan 30 22:45 hpsim.
Where: o hpsim.config is the configuration file for the package. o hpsim.sh is the control script. o hpsim.sh.log is the log file for the control script file. 8. Create the init script. The init script starts the HP SIM application including the database daemon.and is located on the shared drive. # # # # # # # # vi /hpsimlnx/etc/init.
res=0 if [ $ret -eq 0 ]; then echo_success else echo_failure res=1 fi echo /etc/init.d/hpsmdb stop ret=$? Sleep 2 if [ $ret -eq 0 ]; then echo_success else echo_failure res=1 fi echo rm -f /var/lock/subsys/${NAME} exit $res } restart(){ stop start } # See how we were called. case "$1" in start) start ;; stop) stop ;; status) status sgsim ;; restart) restart ;; *) echo $"Usage: $0 {start|stop|status|restart}" exit 1 esac exit 0 9.
Figure 7. Serviceguard homepage 3. Log in to HP SIM. Figure 8 HP SIM homepage 4. Modify the SSL Certificate to match the Cluster Name: a) Select Options > Certificates > Server Certificate.
Figure 9. Server certificate options b) Click New, and enter the required information as shown in the following figure. Figure 10. New certificate f. For the primary node: # # # # /etc/init.
g.For the second system, when the shared storage is available, do not move the content of the sslshare directory, remove the content and create the symbolic link by running the following commands: # rm -rf /etc/opt/hp/ssslhare # ln -sf /hpsimlnx/etc/opt/hp/sslshare /etc/opt/hp c) Click OK to save the certificate. 5. Restart HP SIM. # /etc/init.d/hpsim restart Troubleshooting Processes Processes such as mxdomainmgr can take time (up to 2 minutes on minimum platform requirements based machine) to start.
For more information http:/www.hp.com/go/hpsim http://www.hp.com/go/sglx http://docs.hp.com/en/B9903-90054/ch01s01.html?btnNext=next%A0%BB http://docs.hp.com/en/B9903-90055/ch01s03.html#bgefghei (Serviceguard Manager Installation) © Copyright 2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services.