Insight Control for Linux 6.
© Copyright 2008, 2010 Hewlett-Packard Development Company, L.P. Confidential computer software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license. The information contained herein is subject to change without notice.
Table of Contents I Introduction.....................................................................................................................11 1 Using Insight Control for Linux...............................................................................13 1.1 Overview....................................................................................................................................13 1.2 Integration with HP Systems Insight Manager...................................................
.6.2 Operation controls.............................................................................................................42 4.6.3 Operation target details.....................................................................................................42 II Deployment...................................................................................................................45 5 Managing the Insight Control for Linux repository..............................................47 5.
7.1.3 Discovering running systems ............................................................................................78 7.2 Assigning Insight Control for Linux licenses to discovered systems........................................78 7.3 Preparing and discovering switches and enclosures.................................................................79 7.4 Changing the boot method........................................................................................................80 7.
10.6.2 Using the partition wizard.............................................................................................122 11 Installing and setting up virtual machines.........................................................125 11.1 Installing virtual hosts............................................................................................................125 11.2 Registering the virtual host with HP Insight Control virtual machine management...........126 11.
16.4 Using Insight Control for Linux to run commands and scripts through SSH.......................153 16.4.1 Running an SSH command............................................................................................153 16.4.2 Running a Linux script...................................................................................................153 III Monitoring.................................................................................................................
20.8.5 Launching the Performance Dashboard tool.................................................................189 20.8.6 Using the mouse buttons to manipulate the Performance Dashboard tool...................190 20.8.7 Performance Dashboard tool metrics.............................................................................190 20.8.8 Customizing the Performance Dashboard tool metrics.................................................191 21 Using the command line to view managed system status..........
24.2 Deploying WBEM provider components using Configure or Repair Agents task...............215 24.3 Logging RAM disk connections and operations....................................................................216 V Troubleshooting and support resources...................................................................217 25 Troubleshooting...................................................................................................219 25.1 General troubleshooting topics.............................
25.27 Troubleshooting virtual media problems.............................................................................260 26 Support and other resources.............................................................................261 26.1 Contacting HP........................................................................................................................261 26.1.1 Information to collect before contacting HP..................................................................261 26.1.
Part I Introduction 11
1 Using Insight Control for Linux This chapter addresses the following topics: • • • • • • • • • • • • • “Overview” (page 13) “Integration with HP Systems Insight Manager” (page 14) “Insight Control for Linux extensions to HP SIM” (page 14) “Insight Control for Linux toolboxes” (page 17) “Insight Control for Linux command environment” (page 18) “Internal task queuing and management” (page 18) “Synchronized system clocks” (page 18) “Insight Control for Linux RAM disk environment” (page 19) “Network configur
1.2 Integration with HP Systems Insight Manager enhances the free system monitoring and system management features of HP Systems Insight Manager (HP SIM). The underlying foundation of Insight Control for Linux is HP SIM, with plug-in tools to perform the software installation, monitoring, and management tasks. Insight Control for Linux enables you to manage and monitor a range of objects including Linux servers, switches, and enclosures.
Table 1-1 Insight Control for Linux extensions to the HP Insight Control user interface (continued) Menu item Description Documented in Options→IC-Linux→Define Networks The Define Networks tool provides an interface through Chapter 6 (page 63) which the administrator can create and edit network definitions that can be used by the Network Configuration Editor tool. The network definitions are used by the OS installation tools to implement booting using the virtual media mechanism.
Table 1-1 Insight Control for Linux extensions to the HP Insight Control user interface (continued) Menu item Description Documented in Tools→Server Controls→Power Off Server... Accesses the management processor on the selected Chapter 15 (page 149) target managed system or systems to power on, power off, or reboot the managed system or systems. Tools→Server Controls→Power On Server... Tools→Server Controls→Reboot Server...
Table 1-1 Insight Control for Linux extensions to the HP Insight Control user interface (continued) Menu item Description Documented in Maintenance, troubleshooting and diagnostics Diagnose→Boot to Linux Rescue Mode Boots a managed system to a Linux-based RAM disk rescue environment. Section 25.2.1 (page 220) Configure→Boot to SmartStart Toolkit Boots a managed system to the HP SmartStart Toolkit environment for maintenance. Section 25.2.
1.5 Insight Control for Linux command environment Table 1-2 lists the Insight Control for Linux commands that you can run from the command line on the CMS or on any management hub, with the exception of the pdsh command. Table 1-2 Insight Control for Linux commands Command Description Manpage console Enables access to the serial consoles of managed systems. console(8) headnode Returns the name of the CMS.
Synchronization is required for the Console Maintenance Facility to access a managed system using SSH. Capturing an image from or deploying an image to a server whose system time is incorrect can cause a large number of error messages when the image is deployed. These messages do not affect the deployment, but they can add significant time delays to the deployment. To avoid this and other problems, always synchronize the clock on the CMS and your managed systems.
Virtual media does not use DHCP. The system boots a custom RAM disk that includes the predefined network configuration information (for example. the IP address, Net Mask, Gateway, and so on). Insight Control for Linux provides tools that let you define the network information parameters, edit those network parameters, and initiate bare-metal discovery. 1.11 Managed system names Insight Control for Linux command line commands recognize managed systems by the following name types.
When you run the Options→IC-Linux→Configure Management Services task, it determines if this file exists: • • If the file does not exist, it creates the file and assigns numbers based on the managed systems and the current numbering scheme. The Central Management Server (CMS) is always node number 1. If the file already exists, the configuration task reads the nodenumbers file and assigns the node numbers according to the file contents.
1.11.2 Viewing managed system names After the Configure Management Services task is run, you can list the managed systems with their associated names; use the shownode info command as described in Section 21.2.2 (page 195). 1.12 Connecting to HP SIM To log in and connect to HP SIM, follow these steps: 1. 2. Open a browser window.
Table 1-3 Insight Control for Linux files and directories to back up (continued) Files and directories to back up Reason /etc/hosts Static table for host names. /etc/snmp/snmpd.conf Configuration file for the Net-SNMP SNMP agent. You also must back up HP SIM configuration files to restore your configuration. For more information on these HP SIM configuration files, see the following white paper: Backing up and restoring HP SIM 5.2 or greater data files in an HP-UX and Linux environment 1.
2 Security 2.1 Integrated security features This section describes features that are integrated into HP SIM and Insight Control for Linux to make them secure. Security features are also discussed in context of the associated topic throughout this document. • Browser Connections HP SIM enforces a secure connection to the web browser.
• pdsh Keys The pdsh command uses public host keys to authenticate remote hosts and supports public key authentication to authenticate users. • cmfd Keys The console command uses SSL keys to connect to the console management facility daemon (cmfd) for console access. • secure boot mechanism Virtual media support is provided as the secure boot mechanism. PXE booting provides no authentication or encryption.
Standard Linux deployment, which uses SSH to push an image to the target systems is a less scalable but more secure method than large scale deployment. HP recommends the use of a dedicated management LAN for large scale Linux deployments. For more information on scalable deployment, see Section 10.4 (page 115) • Logging RAM disk connections and operations With a few minor modifications, you can log who has connected to the RAM disk .
An alternate method is to automate this procedure by using a script to extract the iLO's certificate and add it to the HP SIM trusted certificate list. The following is an example of a script that accepts a series of iLO certificates and adds them to the HP SIM trust store. #!/bin/sh # # Get certificate for each iLO passed in as an argument # and add it to the HP SIM trust store.
3 Managing licenses This chapter describes the following topics: • • • “Licensing overview” (page 29) “Adding the Insight Control for Linux license key to HP SIM” (page 29) “Licensing virtual guests” (page 30) 3.1 Licensing overview The licenses for the HP Insight Control power management and HP Insight Control virtual machine management are bundled with the Insight Control for Linux license. The iLO Advance remains as a separate license.
3.3 Licensing virtual guests When a virtual host (VM host) is licensed for Insight Control for Linux, all guests of that VM host are considered licensed for Insight Control for Linux as well, provided that the virtual guests are properly associated with their virtual host. You can license a virtual machine guest (VM guest) without licensing its host or you can license it in addition to licensing its host, in either case unnecessarily consuming licenses.
4 Understanding tasks and task results This chapter addresses the following topics: • • • • • • “Task results overview” (page 31) “Understanding task results” (page 31) “Task results page” (page 31) “Common task results” (page 33) “SIM standard task results format” (page 36) “Scalable task results format” (page 40) 4.1 Task results overview HP SIM and Insight Control for Linux enable you to manage systems by scheduling and running tasks.
Figure 4-1 Task results page Table 4-1 lists the components of the Task Results page. Table 4-1 Components of the Task Results page Available in SIM standard view, scalable view, or common to both views Component Description Task Instance Results Provides the status of the running task or the task that is selected Common in the task list log at the top of the page. Use SIM Standard Task This option is only offered when you run an Insight Control for Common Linux task.
Table 4-1 Components of the Task Results page (continued) Available in SIM standard view, scalable view, or common to both views Component Description Use Scalable Task Results Format radio button This format is unique to Insight Control for Linux tasks and is Common only available as an option when you run an Insight Control for Linux task. Selecting this radio button provides an operation oriented format that enables you to view the status of each operation in a task as it completes on each target.
4.4.1.1 Stopping a task When you select the Stop button in the Task Instance area, the task status is immediately set to Cancelled. The stop process attempts to cancel the task for all targets with non-terminal statuses, regardless of whether or not they have begun running. The stop operation does not affect targets that have already reached a terminal status.
• • • All task level results All parameters displayed in the Parameters pop-up window Target level results, including: — All information displayed in the target status table — All target details, including all information displayed in the operation status table and the log for each operation TIP: If you select All Systems for the report, the target level results are displayed for all targets, each separated by a line. 4.4.2.
Figure 4-5 View of the operation details log 4.5 SIM standard task results format This section describes the portions of the Task Results page that are specific to the SIM Standard Task Results Format, which is the default view. Figure 4-6 illustrates the SIM Standard Task Results format. The figure shows the task results for an instance of a Red Hat Kickstart OS installation task running on three target servers.
Figure 4-6 SIM standard task results format 4.5.1 Summary status and target status area Figure 4-7 illustrates the Summary status: area and target status area, which provide the overall status of a task on each target server. Figure 4-7 View of the summary status and target status areas Table 4-2 describes the information displayed in the Summary status: area. 4.
Table 4-2 Description of target status area Column heading Description Target Name Name of the target managed system on which the task was run. Status The status of a target is computed from the status of its operations. Non-terminal target status Pending: All operations can have the Pending status. Running: At least one operation has the status Running. A percent complete is also displayed.
4.5.1.2 Log button in the target status area When you select the Log button, a new window opens that displays the log for all operations for the task, including the following information: • • • A summary of the task level information The information displayed in the target status table for the selected target A block of information for each operation in the task, including the log The log screen does not auto-refresh.
Table 4-3 Description of target details table Column heading Description Operation Name The name of the operation Status Non-terminal operation status Pending: If this is the first operation, execution of the task for the target has not started. For any other operation, Pending means that one or more of the preceding operations has a non-terminal status. Running: This operation is being run. A percent complete is also displayed. Only one operation for a target can be run at a time.
Figure 4-9 Scalable task results format 4.6.1 Operations table Figure 4-10 illustrates the Operations table, which lists individual operations within a task and provides the status of the entire operation as it starts and completes on each target server. The important thing to know is that operation status represents the status of the operation on every target server.
Table 4-4 lists the information displayed in the Operations table. Table 4-4 Description of the operations table Column heading Description Operation Name The name of the operation that is run as a component of an Insight Control for Linux task. Status Complete: The operation has successfully completed on all target servers. Pending: The operation has not yet started or is not yet complete on all target servers.
Table 4-5 lists the information displayed in the Operation Target Details table. Table 4-5 Description of the operation target details table Column heading Description Target Name The name of the target on which the operation was run on or is running on. Status Complete: The operation has successfully completed on the target servers. Pending: The operation has not yet started or is not yet complete on the target servers.
Part II Deployment 45
5 Managing the Insight Control for Linux repository This chapter provides an overview of the Insight Control for Linux repository and how to perform activities related to it. The following topics are addressed: • • • • “Introduction to the Insight Control for Linux repository ” (page 47) “Registering items in the Insight Control for Linux repository” (page 50) “Copying software to the Insight Control for Linux repository” (page 56) “Editing and deleting registered items” (page 60) 5.
After an OS is registered with the repository, manually copy the vendor-supplied installation media to the appropriate directories in the repository. The media can be a physical CD or DVD, or it can be an .iso image. You must expand the .iso image into flat files. IMPORTANT: Be aware that repository management tasks do not follow typical authorization models. All HP SIM users can select, add, delete, or modify all Insight Control for Linux repository items regardless of their user authorizations. 5.1.
Figure 5-2 Remote repository using the CMS as a gateway 5.1.2 Repository contents Table 5-1 lists the classes of items that are stored in the repository. Table 5-1 Repository item types Name Description ISO ISO image PSP An OS-specific bundle of ProLiant optimized drivers, utilities, and management agents. Supported OS Vendor-supplied installation files for supported versions of RHEL or SLES.
The items listed in Table 5-2 are preregistered and reside in the repository after you install Insight Control for Linux. The default contents include sample RHEL Kickstart and SLES AutoYaST installation configuration files and an example PSP dependency script. Table 5-2 Default repository contents Item type File name examples Description PSP Dependency Script example_dependency.
5.2.2 Registering operating systems Registering a supported version of RHEL or SLES, a supported virtualization OS, or a variant of a Linux OS to make the operating systems available for automated or interactive installations is a simple process: you register the OS in the repository, copy the vendor-supplied installation files to the repository, and copy the appropriate boot files to the associated boot target directory. To register an OS in the repository, follow these steps: 1.
Table 5-3 OS registration information (continued) Registration information Description Path via HTTP Supply a path when the OS is served from a Remotely hosted repository. This path is not required if the OS is being served locally. Supply for supported OS, custom OS, or both Both Enter the full web address (using the IP address) to the OS installation media, such as http://192.0.2.1/redhat/some_version/. For SLES repositories, you must verify if the remote installation media use CD1,CD2,… directories.
10. Select OK to return to the Manage Repository screen. Two new items appear in the table. One item is of the type Supported OS and the other is of the type Boot image. The Boot image item type is added for you automatically. Its name is the same as the supported OS with the word Boot appended. The option to add a Boot image item type is never available because this item type is always associated with a Supported OS item type, and thus, it is created automatically for you.
Enter a descriptive name but do not use the PSP tar.gzip file name, which can be quite long. • • Provide the PSP version number that you copy to the repository. For the supported PSP version or versions, see the HP Insight Control for Linux Support Matrix. The version number must be in the form of N.NN, for example, 8.51. Associate the PSP to the operating systems it supports. Use the Ctrl-Left Mouse Button key combination to select all the operating systems that the PSP supports.
Do not append .cfg to the file name. • • Description of the file. From the drop down list, select the registered operating systems to which the configuration file is applied during an unattended OS installation. Use the Ctrl-Left Mouse Button key combination to select multiple operating systems. • 6. 7. Optionally, associate the configuration file with a custom OS. It is your responsibility to apply the commands in the installation task to retrieve it. Select Save.
6. 7. 8. 9. Select Save. View the summary information, which includes the directory and path where you upload the script. Unlike other items in the repository, the name of the script file you upload to this path is not important, except that it must end in a .sh extension. If multiple files are named *.sh in this directory, only the first script detected is used. Copy the script to the newly-created directory to make it available for deployment. Select OK to return to the Manage Repository screen. 5.2.
• “Copying or downloading PSPs into the repository” (page 60) 5.3.1 Copying RHEL into the local repository on the CMS The OS directory and the boot target directory where you copy the installation files were provided to you during the OS repository registration process described in Section 5.2.2 (page 51). You were instructed to record the paths to these directories.
There are three DVDs that comprise SLES Version 11. Only the first DVD must be copied to the repository. DVD2 contains source files; DVD3 contains the documentation. Each service pack release for SLES Version 10 has already applied all patches to the installation media. To copy vendor-supplied SLES Version 10 OS installation files into the repository, follow these steps: 1. 2. Count the number of installation media discs (CD or DVD) that were shipped with the SLES Version 10 distribution.
3. 4. Copy the contents of each installation disk into its own directory. For example, copy the contents of the first CD into the CD1 directory, and so on. Copy the kernel and RAM disk boot files to the related boot target directory. The kernel file name is linux and the RAM disk file name is initrd. On the SLES media, the kernel and RAM disk files are located in an architecture-specific subdirectory named boot/i386/loader or boot/x86_64/loader on the first SLES installation disc. 5.3.
5.3.7 Copying or downloading PSPs into the repository You can either copy a PSP (PSPs are packaged with HP SIM) or download it. Copying a PSP from the CMS To copy a PSP: 1. 2. Locate the PSP by OS type and version from the /var/opt/mx/linuxagents directory. Copy the compressed tar file (*tar.gz) to the PSP path on disk directory that was created when you registered the PSP in the repository (for example, /opt/repository/psp/ redhatV50).
5.4.2 Deleting registered items from the repository NOTE: Deleting an item from the repository does not delete the corresponding directory in the /opt/repository directory nor does it delete the files that you might have copied to that directory. If you want to delete or move the directory and files, delete or move them manually after you first perform the following procedure. To remove an item from the repository, follow these steps: 1.
6 Configuring network parameters for virtual media Topics include: • • • • • “Introduction” (page 63) “Preparing for virtual media” (page 64) “Using the Define Networks tool” (page 67) “Using the Network Configuration Editor” (page 70) “Next Step” (page 73) 6.1 Introduction Virtual media is a mechanism available only for systems with an iLO-based management processor. Virtual media allows a system to boot an ISO image over the network; it is the alternate boot mechanism to PXE.
IMPORTANT: Use these tools to define the network configuration parameters before running any other tool that uses virtual media, especially Initiate Bare Metal Discovery. Usually, network configuration is performed in two stages: • • In the first stage, you define the network configuration parameters and store them under a network name. You can have as many network name definitions as you want.
3. Select either the Discover a group of systems or Discover a single system button. There is a slight difference in the window for these two choices. The Discover a group of systems choice is in the illustration. 4. Enter a descriptive name in the Name text field. The descriptive name must be either listed in the CMS's hosts file or known to the CMS's name server. Otherwise, enter an IP address. 5. 6. Ensure that the Schedule check box is not checked.
6.2.2 Creating a user account and enabling virtual media on the management processor You must create a user account on the management processor, if one doesn’t already exist. The user name and password must match the management processor user name and password you specified when you installed Insight Control for Linux. The iLO is capable of supporting multiple user accounts; if your iLO was already configured with other user accounts you can just add another user account.
5. Select Save User Information. NOTE: Do not disconnect your browser from this management processor address. You might need it to license virtual media, which is described in the next section. 6.2.3 Licensing virtual media on the management processor Your iLO Advanced license key activates iLO Advanced features. For the latest instructions, which may supersede those shown below, see the following website: www.hp.
To define the network configuration parameters, start the Define Networks tool by selecting the Options→IC-Linux→Define Networks... menu item. You can also start this tool from the Network Configuration Editor. Figure 6-1 Define networks tool The parameters in the Define Networks tool include the following: • Available Networks This is a list of the network definitions. When you create a new network definition, its name is displayed in this list after pressing Save.
• IP Address Range If you want to have IP addresses assigned automatically, you can enter a range of IP addresses in this optional field. Specify the range with a hyphen, for example: 192.168.10.5-192.168.10.50 You can enter a comma-separated list of ranges, for example: 192.168.10.5-192.168.10.50,192.168.11.100-192.168.11.199 If you want to assign IP addresses manually, leave this field blank. • SNMP Server(s) Optionally enter a list of SNMP servers. These entries are reserved for future use.
4. Select the Save button. The network definition is overwritten with the new parameters. 6.3.4 Deleting a network definition 1. 2. Choose the network definition from the Available Networks list. Select the Delete button. Unless there are any systems that had this network applied to them, the network definition is erased and its name is removed from the Available Networks list. 6.
Figure 6-2 Network Configuration Editor page 4. 5. Optionally verify the management processor by moving the mouse pointer over the Management Processor Name field, but do not select it. The management processor's serial number and IP address are displayed to help you identify it. In the Server Host Name field, enter a unique name for the server associated with the management processor. NOTE: Even though a server might have more than one NIC, you can only specify one name for the server. 6.
7. Select any of the predefined network configurations from the drop-down menu. If a predefined network configuration does not exist, you can create one using the Define Networks button Selecting a network from this list assigns that network to the NIC represented by the MAC address selected in the Port/MAC Address column.
base name and 1 for the iterator, the first available host name assigned would be comp1, the next would be comp2, and so on. The number of digits that you enter for the value for the iterator determines whether the host names generated have leading zeroes. For example, if you entered comp for the base name and 001 for the iterator, the first available host name would be comp001, the next would be comp002, and so on. TIP: Ensure that the base name and iterator that you specify respects the names of servers.
7 Discovering systems, switches, and enclosures This chapter addresses the following tasks, which you must complete in the following order when you are configuring and setting up Insight Control for Linux: 1. 2. 3. 4. “Discovering systems” (page 75) “Assigning Insight Control for Linux licenses to discovered systems” (page 78) “Preparing and discovering switches and enclosures” (page 79) “Changing the boot method” (page 80) 7.
NOTES: • You can update a server's firmware automatically as part of the bare-metal discovery process. For information on enabling this feature, see Section 12.2.3 (page 138). • You can initiate a one-time PXE boot, or set the server to always PXE boot before booting from the local hard disk. Either method is acceptable. • For the servers to PXE boot the Insight Control for Linux RAM disk, you must have configured DHCP as described in the HP Insight Control for Linux Installation Guide.
Figure 7-1 Initiate Bare-Metal Discovery tool IMPORTANT: You can use the Initiate Bare-Metal Discovery tool tool for servers that are PXE-booted or that use virtual media. If you are using virtual media, you can only use this tool for bare-metal discovery of an iLO-based management processor. Furthermore, before using this tool, you must first discover the server's iLO and define the network configuration definitions using the Network Configuration Editor.
NOTE: You can also use this tool to initiate a bare-metal discovery of a server that are PXE-booted. Select the PXE radio button in step 4. The target system must be the management processor for the server. 7.1.3 Discovering running systems NOTES: • This section applies only to systems with iLO-based management processors. For servers with LO100 management processors, after you discover the server with HP SIM, run the Configure SNMP on DL1xx Servers task described in Section 23.9 (page 211).
NOTES: • When you apply the HP Insight Control for Linux license, the license is locked immediately when it is assigned to the server. Before Version 6.0, the license was assigned, but locked later during an Insight Control for Linux operation (for example, installation or setting up monitoring). • Exercise caution when assigning an Insight Control for Linux license, particularly when assigning licenses to multiple targets. Only servers require this license.
• 2. Replace OA_name with the name of the OA, which you can determine by selecting on +All Enclosures in the left pane of the HP Insight Control user interface and finding the OA name. Use the following menu item from the HP Insight Control user interface to discover the enclosures and switches: Options→Discovery... a. b. c. d. e. f. 3. Select New... In the Ping inclusions range text box, enter the IP addresses or host names of the OAs and switches to be discovered, one entry per line.
3. Select Run Now from the Verify Target Systems window. The Configure Boot Method window opens. 4. Select the boot method, either individually or for all the target systems Individually: Select the radio button in the PXE column or Virtual Media column for each system. For all target systems: You can select the boot method for all the systems listed by selecting the check box in the PXE or Virtual Media column heading. 5. Select Save.
8 Setting up managed systems This chapter is an overview on setting up managed systems for Insight Control for Linux monitoring. This chapter addresses the following tasks, which you must complete in this order: 1. 2. 3. 4. “Populating the Insight Control for Linux repository” (page 83) “Setting up management hubs” (page 161) “Linux OS installation” (page 83) “Setting up managed systems for monitoring” (page 83) 8.
8.3.1 Opening network ports on managed systems The network ports listed in Table 8-1 are used for communication between the managed systems and the CMS. These ports must be open to network traffic. If you used Insight Control for Linux to install an OS and you used a configuration derived from a supported template, the firewall is enabled by default and Insight Control for Linux opens the ports listed in Table 8-1 automatically.
To ensure that the CMS can resolve the host name that is appended to all syslog events that originate from managed systems, follow these steps: 1. Determine the managed system's name by running the hostname command on the system: # /bin/hostname If the node does not report a host name, set one or configure DHCP to assign one. DHCP configuration information is located in the HP Insight Control for Linux Installation Guide. 2.
Figure 8-1 Installing providers and agents 3. 4. Select Next>. Review the settings for Configure or Repair Agents, as shown in Figure 8-2. Insight Control for Linux requires you to make settings in the Configure SNMP and Configure secure shell (SSH) access authentication sections of this screen.
Figure 8-2 Settings for configure or repair agents 5. Make the following settings to configure SNMP: • Select Set read community string and enter the value for your network configuration. 8.
NOTE: To discover or identify a server that becomes a managed system, HP SIM requires that a SNMP read community string must be set to public in the global credentials for that server. There may be additional read community string settings in addition to public, but public must be specified. • • 6. Select Send traps to refer to this instance of HP SIM. You can optionally set Send a sample SNMP trap to this instance of HP SIM, but it is not required.
Step 1: Make the appropriate association on the system BIOS. Depending on how you decide to configure your system, you might not need to do anything. As a general rule, the factory default system BIOS settings are as follows.
title Red Hat Enterprise Linux Server (2.6.18-92.el5xen) root (hd0,0) kernel /xen.gz-2.6.18-92.el5 com1=115200,8n1 1 module /vmlinuz-2.6.18-92.el5xen \ 2 ro root=/dev/VolGroup00/LogVol00 rhgb \ quiet console=ttyS0 3 module /initrd-2.6.18-92.el5xen.img title Red Hat Enterprise Linux Server-base (2.6.18-92.el5) root (hd0,0) kernel /vmlinuz-2.6.18-92.el5 \ ro root=/dev/VolGroup00/LogVol00 rhgb quiet initrd /initrd-2.6.18-92.el5.img Add com1=115200,8n1 here.
9 Installing operating systems on managed systems This chapter addresses the following topics: • • • • • • • • “Linux OS installation overview” (page 91) “Using installation configuration files for unattended installations” (page 92) “Prerequisites to OS installations on managed systems” (page 97) “Installing RHEL on managed systems” (page 99) “Installing SLES on managed systems” (page 101) “Installing VMware ESX and VMware ESXi operating systems” (page 101) “Installing another variant of Linux on managed
Table 9-1 Types of Installation Sessions (continued) Installation Interactive Unattended VMware ESX VMware ESX Interactive VMware ESX (Kickstart) VMware ESXi VMware ESXi Interactive For more information about using Kickstart and AutoYaST files for unattended installations, see Section 9.2 (page 92).
IMPORTANT: • HP provides a default set of basic Kickstart and AutoYaST installation configuration files for each supported OS. HP recommends copying and using the default installation configuration files as templates to create customized installation configuration files that are suitable for your environment. Familiarize yourself with the contents and usage comments in the configuration file templates, and use them to make versions that are appropriate for your own environment.
• OS version Directory name in /opt/repository/instconfig RHEL Version 5 Update 5 (for Virtual Hosts) rh055–virt-host RHEL Version 5 Update 5 (for Virtual Guests) rh055–virt-guest VMware ESX Version 3.5U5 esx035 VMware ESX Version 4.1 esx041 The associated installation configuration files are stored in the OS-specific directory under /opt/repository/instconfig and use the same naming convention. For example, rh055.
Table 9-2 Insight Control for Linux macros for installation configuration files Macro name Description %%agentinstall%% This macro is unique to Insight Control for Linux. During installation, it expands into a shell script that downloads the two PSP components from the CMS and installs only the packages that HP SIM and Insight Control for Linux need to be able to monitor the managed system properly.
9.2.3 Installation configuration files for custom operating systems You can upload installation configuration files for unsupported operating systems into the Insight Control for Linux repository. However, the OS installation process does not have a built-in mechanism for linking the installation configuration files to a given installation.
! ex /etc/inittab <> /etc/securetty 1 The designation Space-TAB means a space character followed immediately by a tab character. Thus. this line can be interpreted as: /^[ \t]*kernel For managed systems that are virtual hosts: ex /boot/grub/menu.lst <
• • • • • • You have set the user name and password on the management processors. For more information about setting or changing management processor credentials, see Section 23.1 (page 207). For more information on management processor credentials themselves, see “Management Processor Credentials” (page 213). You registered the supported OS in the repository and you have copied the vendor-supplied source installation files to the repository path the OS registration process created.
Before you can use Insight Control for Linux to install Linux on these servers, you must: • • • Download the files and copy them to the appropriate directories under /opt/ repositories/boot, overwriting the original initrd supplied with the distribution of the corresponding Linux operating system. Ensure that you have the correct PSP in the repository. For information on the PSP version, see the HP Insight Control for Linux Support Matrix.
NOTES: • During installation, when specifying the HTTP setup, you are prompted for the IP address of the CMS and the path name for the RHEL installation. For example: http://CMS-IP-addr:CMS-port/path-name Where: • CMS-IP-addr is the IP address of the CMS CMS-port is the port number of the repository web server that you specified when you installed Insight Control for Linux. The factory default value is 60000.
9.5 Installing SLES on managed systems This section describes the two methods for installing SLES to one or more managed systems: • • “Installing SLES using an unattended method” (page 101) “Installing SLES interactively” (page 101) NOTE: When you use Insight Control for Linux installation tools to install SLES on a managed system, Insight Control for Linux automatically edits the /etc/ssh/sshd_config file and turns on password authentication in this file.
IMPORTANT: • Installing a virtualization OS on a server erases data on that system. Before you begin, be sure that you have captured or backed up any data you want to retain before you begin. Preserving user data on volumes other than the principle target volume is not guaranteed. Presume that data on primary and secondary volumes is erased. The tasks for installing the virtualization OS are launched from the following HP SIM menu: Deploy→Operating System • The VMware ESX 3.0 and VMware ESX 3.
8. Optionally, you may set the root account password at this step. If you want the target system to use the default root password (root), select the Use Default Root Password option. To set a root password other than the default, select the Specify Root Password option, enter the root password, and verify the entry. HP recommends setting a strong root password on all your severs. 9. Do one of the following to start the installation: • Select Run Now to launch the installation operation immediately.
5. Select the virtualization OS to install and select Next>. Only the virtual machine OS that applies to your installation is available for you to select from the menu. IMPORTANT: The list contains only those virtualization operating systems that are registered in the repository and copied to it. If you select a virtualization OS that was registered, but the installation files were not copied to the repository, a validation error appears. 6. 7.
NOTE: When performing an ESXi installation using virtual media, to facilitate the installation, Insight Control for Linux does not automatically remove the ISO image that was created. This ISO image contains the RAM Disk and removing the ISO image while RAM disk is loaded causes the installation to fail. HP recommends, if disk space is a concern, that you remove the ISO image manually. The ISO image is named using the server's Globally Unique IDentifier (GUID).
3. Create the following scripts, as needed: Script Description auto_config Required for an unattended installation, this script performs macro substitution so that a working copy of your installation configuration file has the actual values required for your installation. boot_stanza This script constructs a boot stanza that specifies your kernel and RAM disk, which enables your boot loader to boot your custom OS.
Custom or Other Interactive Custom or Other (Unattended) 3. Do one of the following to select and verify that the server or servers in the target list are the servers to which you want to install an OS: • Proceed to the next step if the target list is correct. • Select Add Targets... or Remove Target to modify the list, if the list is incorrect. • If no servers are in the list, do the following: a. Select Collection. b. Select All Servers from the drop down menu. c.
If you want the target system to use the default root password (root), select the Use Default Root Password option. To set a root password other than the default, select the Specify Root Password option, enter the root password, choose the password encryption option, enter the root password, and verify the entry. HP recommends setting a strong root password on all your severs. 10. Do one of the following to start the installation: • Select Run Now to launch the OS installation operation immediately.
10 Capturing and deploying Linux images This chapter addresses the following topics: • • • • • • “Overview of capturing and deploying Linux images ” (page 109) “Prerequisites to capturing a Linux image” (page 111) “Capturing a Linux image from a managed system” (page 114) “Preparing for scalable deployment” (page 115) “Deploying a captured Linux image to one or more managed system” (page 118) “Insight Control for Linux partition wizard overview” (page 120) 10.
NOTE: To account for the time it may take to capture or deploy a very large image over a slow network, a time out of five days is in effect for capturing or deploying a Linux image so that you can determine if an operation hangs. HP recommends that you check your task results to verify the status of any running jobs. 10.1.1 File system types Table 10-1 lists the supported and unsupported file system types on the source and target managed systems for Linux image capture and deployment tasks.
The script is run in a chroot environment so there is no need to configure paths relative to the Insight Control for Linux environment. For information on how these scripts can be used, see the comments in the example scripts provided with Insight Control for Linux. 10.1.
Table 10-2 Source and target deployment requirements Item Requirement Server type The hardware models of the source and target managed systems must be the same. For example, if you capture an image from an HP ProLiant BL460 G5 server, you can only deploy that image to another BL460 G5 server. Memory Differences in the amount of memory on the source and target managed systems are permitted. Number of NICs Differences in the number of NICs on the source and target managed systems are permitted.
• For SLES images, change the hard links to soft links before capturing the image. SLES relies on the use of hard links within its file system, and the tar command that captures the image captures those hard links. If a partitioning scheme is used during deployment that distributes files to multiple file systems (like separate /usr and /var partitions), the tar command does not allow hard links to be established across separate file systems. This generates an error, causing the task to fail.
The following example does not include the contents of /scratch in the captured image (because the dump flag is set to 0). During the image deployment operation, the disk is repartitioned and /scratch is an empty file system. /dev/sdc1 /scratch ext3 defaults 0 0 10.3 Capturing a Linux image from a managed system IMPORTANT: Remember that captured images are retrieved through a web server interface that allows anonymous access.
7. Select a Precapture script, a Postcapture Script, or both. A Precapture script is run on the managed system before the image is captured. A Postcapture script is run on the managed system after the image is captured. The default behavior is to not run either type of script. 8. Do one of the following: • Select Run Now to launch the image capture operation immediately. • Select Schedule to schedule the image capture operation to occur in the future. 9.
Figure 10-1 Network groups example The concept behind a scalable deployment is to transfer an OS image tar file from the CMS to the group leader in each network group. After the image tar file is completely transferred, the group leader transfers the image to each of the remaining servers in the network group. The advantage to this concept is that all network traffic is kept local to the switch or enclosure.
The Customize Collections window appears. 2. Select New... in the Customize Collections window. A new section titled New Collection appears at the bottom of the Custom Collections window. 3. 4. Select the Choose members individually radio button. Select All Servers from the Choose from: menu. This action populates the Available Items: list with the available servers. 5. Perform the following steps for each switch you have: a.
f. Select Save As Collection... The Save As Collection portion appears. g. Enter a name for this network group. The name is used only to associate the managed systems in the network group. h. i. j. Select Existing collection: and choose the Network Groups menu item. Select OK to continue. Generate the netgroups.conf file with the following command: # /opt/hptc/bin/netgroup --ofile /opt/mx/icle/netgroups.conf k. Examine the netgroup.conf file to verify the collection entry for the group.
1. Select the following menu item from the HP Insight Control user interface: Deploy→Operating System→Deploy Linux Image... 2. Do one of the following to select one or more target managed systems: • If no servers are in the list: a. Select Collection. b. Select All Servers from the drop down menu. c. Select View Contents to display a list of available managed systems in the collection. d. Select one or more managed systems from the list. e. Select Apply . f.
Figure 10-2 Existing disk partition scheme See Section 10.6 (page 120) for a general overview of the Partition Wizard and how to use it to edit disk partitions and volume groups. Select Next> after you have completed customizing the disk partition layout. 9. Optionally select any or all of the following types or scripts (one of each): • Predeployment script • Postdeployment script • Final Deployment script For information on these scripts, see Section 10.1.3 (page 111). 10.
The Partition Wizard user interface provides a representation of the disk partition layout to be applied to the target server before laying down the image. Because the Partition Wizard does not know about the storage media, it works with a generic representation that was created to describe the storage media. The Partition Wizard is designed to work with the ext3, ReiserFS, Swap, and LVM file system types. 10.6.
• If you are capturing and deploying a reiserfs or an ext3 partition type, ensure that the mount points are set, as required. Partition types swap and lvm do not have mount points. The Partition Wizard permits you to proceed without specifying mount points for the reiserfs and ext3 partition types, and it does not detect the missing mount points. This might cause the deployment to fail, and the failure is indicated in the Task Results. • The Partition Wizard does not save entered values for reuse.
The initial Partition Wizard table is divided into two sections: Hard Drives, the top of the table that shows the physical devices, and Volume Groups, the bottom part of the table that shows logical volumes: • The Hard Drives section represents the physical media on the server. You must have prior knowledge about the hardware in order to add the correct number if disks. You can add a maximum of 16 disks to the Hard Drives section along with a maximum of 16 partitions per disk.
11 Installing and setting up virtual machines This chapter addresses the following tasks, which you must complete in this order: 1. 2. 3. 4. 5. 6.
2. 3. 4. Set the Global Sign-In credentials for the virtual host with the Options→Security→Credentials→Global Credentials... menu item. Install the operating system with virtualized configuration on the physical server of your choice. Chapter 9 (page 91) describes the steps for using Insight Control for Linux to install a Linux operating system. Run Options→Identify Systems... to verify the installation. The next step is to register the virtual host with the virtual machine management. 11.
For information on configuring the agents, see Section 8.3.4 (page 85). 3. Examine the system page for the virtual host with Tools→System Information→System Page... task to verify that HP Insight Control virtual machine management is configured correctly. Locate the System Subtype row under Product Description. The description should contain the text Virtual Machine Host. 11.
4. Verify that the VM guest is configured to enter the BIOS on the next boot; if it is not in that state, change it. When in the BIOS, change the boot order to boot from the CD or DVD first. This ensures that the system boots from the CD or DVD before the other choices. 5. 6. 7. Open the console of the VM guest. Boot the VM guest, and proceed through an interactive install. Perform a network installation using an installation configuration file from the Insight Control for Linux repository.
IMPORTANT: The RHEL Kickstart and SLES AutoYaST configuration template files for virtual guests are delivered with a hard-coded root password, which poses a security issue if used without modification. For secure installations, HP recommends that you install the virtual guest operating systems in a manner that keeps the root password secure, such as an interactive installation, or use a Kickstart or AutoYaST file that is properly protected on the local host.
• • • The default value for the memory setting is adequate. However, if you have sufficient memory, you can increase this value to 1024 to improve the virtual guest machine performance. When creating a disk image for the virtual guest machine, the default storage setting for the guest is sufficient unless you know that the applications you want to run need additional storage. Select Advanced options to configure the NIC correctly as a bridge. Select br0 or br1, as appropriate.
— — — The default value for the memory setting is adequate. However, if you have sufficient memory, you can increase this value to 1024 to improve the virtual guest machine performance. When creating a disk image for the virtual guest machine, the default storage setting for the guest is sufficient unless you know that the applications you want to run need additional storage. Select Network Adapters to configure the NIC correctly as a bridge. Highlight the bridge setting, then select Edit.
Check that the configured MAC address matches the DNS/DHCP entry. The virt-manager command might display MAC addresses beginning with 00:16:3e, but configure the NIC to start with 52:54:00 instead. You can adjust the MAC address by running these commands on the KVM virtual host: 1. Use the following command to shut down the virtual guest: # virsh shutdown virtual_guest_name Watch the console to determine that the virtual guest is completely shut down. 2.
os_specifier Specifies the operating system to be installed on the virtual guest, for example, RHEL5U5-i386 or SLES11SP1-i386/DVD1. Some releases of SLES may specify CD1 instead of DVD1. For example, http://mercury.example.com:60000/os/RHEL5U5-i386 • For a RHEL Kickstart file, specify a URL that links to its location in the Insight Control for Linux Repository: http://CMS:port/instconfig/os/osver-virt-guest/osver-virt-guest.cfg Where: osver • Specifies the operating system version, for example rh052.
2. Power on the virtual guest from HP SIM, by selecting the virtual guest node then selecting Tools→Virtual Machine→Start Virtual Machine. Wait until the virtual guest boots. 3. 4. Run the ping command from any system to verify the virtual guest is up and running. Specify the well-known IP address for the virtual guest. From HP SIM, perform an OS discovery of the virtual guest: a. Select Options→Discovery, specifying the virtual guest. b. Select New. c.
CAUTION: • If you shut down or stop a virtual machine guest, unsaved data is lost. Suspending or pausing a virtual machine guest: Tools→Virtual Machine→Suspend Virtual Machine • Resetting or restarting a virtual machine guest: Tools→Virtual Machine→Restart Virtual Machine For KVM virtualizations, HP recommends using the virt-manager, virsh, and other, KVM-specific commands to manage the virtual guests from their KVM hosts, where these tools are typically installed.
12 Using Insight Control for Linux to update HP ProLiant firmware This chapter addresses the following topics: • • • “Overview of updating HP ProLiant firmware” (page 137) “Basic firmware update functionality” (page 138) “Advanced firmware update functionality” (page 140) 12.1 Overview of updating HP ProLiant firmware Keeping firmware up to date is a challenging but necessary task. Each ProLiant server usually has several devices that require regular firmware updates, which can create a burden.
12.2 Basic firmware update functionality Basic firmware update functionality is designed to provide an easy to set up and easy to use way of updating firmware for people who simply want to keep their firmware up to date. Just download the latest Smart Update Firmware DVD, install it into the Insight Control for Linux repository, and run the update. 12.2.1 Initial setup Before you can initiate a firmware update on a server, you must download and prepare the firmware files and tools that do the work.
To enable this feature, edit the Insight Control for Linux properties file, /opt/mx/icle/ icle.properties, and add the following line: RUN_HPSUM_ON_BARE_METAL_DISCOVERY=true It is not necessary to restart SIM for these changes to take effect. In addition, if you want to run the hpsum command with option flags, add an additional line to the file that looks like this: HPSUM_FLAGS=parameters Where parameters indicates the hpsum command's option flags you want to use.
This procedure requires 300 to 400 MB of temporary disk space: 1. Create a temporary directory for the contents of the tar file. # mkdir /tmp/fw-temp 2. Extract the contents of the tar file: # cd /tmp/fw-temp # tar xf /opt/repository/firmware/firmware-files.tar 3. Add or remove the required firmware files. Here are some examples; all these examples are performed from the temporary directory for the firmware, /tmp/fw-temp: # Copy latest firmware from /root cp /root/CP009403.scexe .
a group of servers to have different firmware versions from the rest. Insight Control for Linux has a simple and flexible method for controlling this. 12.3.1 Understanding the firmware configuration file A firmware configuration file controls the advanced firmware update features of Insight Control for Linux. It is a plain text file you create and store in the Insight Control for Linux repository at the following location: /opt/repository/firmware/firmware-config.
Example 1 prod-server-1=production-firmware.tar prod-server-2=production-firmware.tar 172.31.64.100=skip 01:00:ab:67:45:ee=latest-firmware.tar In this example, the two production servers need to be at very specific firmware revisions, so a special firmware tar file was created which only contains firmware that has passed the proper testing. The skip flag is used with the IP address of a very old server running old software, which should never have its firmware updated.
13 Installing PSPs on managed systems This chapter addresses the following topics: • • • • “Overview of the PSP installation tool” (page 143) “Required PSP components” (page 143) “Creating a PSP dependency script” (page 144) “PSP installation procedure” (page 145) 13.1 Overview of the PSP installation tool The Insight Control for Linux PSP installation tool enables you to install any or all PSP components on one or more managed systems.
NOTE: The RPMs for these PSPs are OS- and platform-specific and are named as such, for example, the HP ProLiant Channel Interface for Red Hat Enterprise Linux 4 (x86_64). 13.3 Creating a PSP dependency script Some utilities contained in the PSP have RPM dependencies that must be met for them to install correctly. These dependencies are documented in the HP ProLiant Support Pack User Guide. TIP: For instructions on how to obtain the HP ProLiant Support Pack User Guide, see Section 26.3.2 (page 264).
IMPORTANT: If an errata kernel is installed on the managed system, ensure that the PSP package you want to install supports the errata kernel version. 13.4 PSP installation procedure After you have created done the following begin the installation process: • • • 1. Created the PSP dependency script. Registered it in the Insight Control for Linux Repository, and Copied the PSP dependency script to the /opt/repository/pspscript/ example_dependency.sh directory.
If any of the selected software components did not install successfully on the target managed system for any reason, including package dependency failures, the final state of the task on that system is Failed. Review the log carefully because it contains important PSP installation results. If any of the PSP components failed to install due to RPM dependency requirements, you must resolve the RPM dependencies and run the Install ProLiant Support Pack (PSP)... tool again.
14 ISO control operations ISO Controls allow you to boot from an ISO image, insert an ISO image, and eject an ISO image on iLO-based managed systems. You can use this functionality to perform an interactive Windows Installation. The ISO image must be registered in the Insight Control for Linux repository before you can perform these operations. For information on registering an ISO image, see “Registering an ISO image” (page 56).
15 Remote server controls The menu items on the Tools→Server Controls menu enable you to remotely manage power control on a physical managed system. IMPORTANT: Be aware that the Insight Control for Linux server controls operate by contacting the management processor of the server directly and executing the requested power function. That means that servers are powered off or cycled abruptly without a graceful shutdown.
16 Using SSH for remote server management Insight Control for Linux provides several ways for you to access a managed system through SSH. This chapter addresses the following topics: • • • • “Setting SSH credentials on managed systems” (page 151) “Setting SSH credentials for users” (page 151) “Running a command on multiple managed systems” (page 152) “Using Insight Control for Linux to run commands and scripts through SSH” (page 153) 16.
On the Task Results screen, the Task Instance Results always shows the user who launched the task. This might not be the credentials used for the task execution. Because different target managed systems can have different users specified in the SSH settings, the same task can run on different targets as different users. 16.
16.4 Using Insight Control for Linux to run commands and scripts through SSH The following menu items enable you to run a script or command through SSH to one or more managed systems: • • Tools→Command Line Tools→Run SSH Command... Tools→Command Line Tools→Run Script... 16.4.1 Running an SSH command The Tools→Command Line Tools→Run SSH Command... runs a command on a target server.
managed systems. This Insight Control for Linux tool captures Stdout and Stderr from the script, captures the return code from the script, and closes the SSH connection. The Run Script... task feeds the command lines in the script to an SSH instance on the target system. The script is a series of command lines to be run on the target system using SSH. The Linux script you run must be located in the Insight Control for Linux repository in the /opt/ repository/script directory.
Part III Monitoring 155
17 Managing Insight Control for Linux collections This chapter addresses the following topics: • • • • “Introduction to collections” (page 157) “Populating a collection” (page 158) “Adding servers and switches to an Insight Control for Linux collection” (page 158) “Removing a managed system or switch from an Insight Control for Linux collection” (page 160) 17.
Table 17-1 Insight Control for Linux subcollections (continued) Object type Subcollection name Description How populated Enclosures {collection_name}_Enclosures If the hardware configuration contains HP blade servers and enclosures, this collection provides access to the enclosures. Switches Populated manually only. {collection_name}_Switches Insight Control for Linux monitors all switches placed in this subcollection.
1. Use the instructions in Chapter 7 (page 75) and Chapter 8 (page 83) to perform the following tasks to prepare servers: • Discover the server or servers. Make sure you follow the appropriate discovery process because the procedure differs for bare-metal servers and servers that already have a supported Linux OS installed on them. • Deploy a Linux OS to the server if it does not have an OS installed.
NOTE: This step removes non-licensed servers, VMware ESXi virtual hosts, and Microsoft Windows guests from the collection. 17.4 Removing a managed system or switch from an Insight Control for Linux collection NOTE: For information on removing a management hub, see “Removing a management hub” (page 163) If you want Insight Control for Linux to stop monitoring a managed system or switch, follow these steps to delete a managed system or switch from the Insight Control for Linux collection: 1.
18 Setting up management hubs 18.1 About management hubs A management hub is an aggregation point for management activities. Insight Control for Linux uses management hubs to distribute the management load across multiple servers. HP recommends creating multiple management hubs if you plan to monitor over 256 managed systems. You have the option of choosing any physical server to act as a management hub; you can elect to use the CMS as a management hub or not.
18.2 Creating a management hub Use the following procedure to create a management hub; you can perform Steps 2 and 3 in reverse order, but both must be done before step 4. 1. 2. 3. Determine which server or servers you want to act as a management hub. Install the operating system for that server using the appropriate Kickstart or AutoYaST file; this file has the form *-management-hub.cfg to ensure that the required RPMs are installed.
There are two text fields, Collection name and Choose from, and two lists, Available items and Selected Members. e. Select All Servers from the Choose from: menu. This action populates the Available Items: list with the available servers. f. Select the server from the Available Items: list. You can use Ctrl-Left Mouse for multiple selections. g. h. i. Use the >> button to move the selected servers from the Available Items: list to the Selected Members: list. Select OK.
19 Configuring monitoring services This chapter describes how to configure Insight Control for Linux monitoring services. In addition to an Section 19.1 (page 165), this chapter addresses the following tasks, which you must complete in this order: 1. 2. 3. 4.
Insight Control for Linux monitors only the objects in these collections: • • • Either all licensed servers are automatically added to the {collection_name}_Servers subcollection or only the servers in the {collection_name}_Servers collection, depending your response on the Auto-populate option. The management processors associated with the licensed servers are automatically added to the {collection_name}_Console_Ports subcollection.
TIP: 5. Populate your collections manually before proceeding. Select Run Now. This task can take several minutes to configure services. The Stdout tab shows the scripts that are running, and Done appears when this task is complete. 6. When processing is complete, select the following menu item to review the log files to determine if the operation was successful: Tasks & Logs→View Task Results 7. Select the Stdout and Stderr tabs on the tasks results screen to see more information.
4. Verify that the nrpe daemon is working on all the managed systems with the following command: # /opt/hptc/nagios/libexec/gather_all_data --verbose write 4048, 2, 2, eth1 to db => icelx2 (charon.example.com) write 4048, 2, 2, eth1 to db => icelx4 (pluto.example.com) write 4048, 2, 2, eth1 to db => icelx1 (poseidon.example.com) 5. Ensure that the vars.ini file is synchronized across all the managed systems # /opt/hptc/nagios/libexec/check_nagios_vars --update Vars OK on nodes icelx[1-2,4] 19.5.
# nrg --mode analyze Nodelist ----------------------neptune Description -------------------------------------------------- If 'data is state', then look at the status of the 'Supermon Metrics Monitor'on either the controlling management_hub or on the management_server node as they collect and report the data for these entries. neptune Data may be stale, look at the status of the Supermon Metrics Monitor as it provides the status for this service.
20 Using graphical tools to monitor managed systems This chapter addresses the following topics • • • • • • • • “Insight Control for Linux system monitoring overview” (page 171) “Nagios overview” (page 172) “Using Nagios” (page 175) “Services monitored by Nagios” (page 183) “Understanding Nagios alert messages” (page 185) “Understanding system event log monitoring ” (page 186) “Configuring Nagios email alerts” (page 186) “Monitoring Metrics in real time” (page 187) 20.
NOTE: Insight Control for Linux does not support monitoring of virtual hosts running VMware ESXi , and does not support servers or virtual guests running Microsoft Windows. 20.1.1 Collecting metrics through a management processor Insight Control for Linux supports management processors using the iLO or IPMI protocols for gathering sensor and system event log information. To access a system’s management processor, you must configure the management processor credentials in HP SIM.
Nagios, as provided with Insight Control for Linux, is configured with system and network service checks already in place for your system; these network service checks are automatically configured for each managed system. Nagios obtains its sensor and metric data from the Supermon open source monitoring application, which is integrated with the Insight Control for Linux. Figure 20-1 illustrates the interaction of these tools.
20.2.2 Launching Nagios To launch Nagios, you must have a valid certificate for the Apache service. To configure an Apache certificate, see Section 19.2 (page 165). Select the following menu item from the HP Insight Control user interface to launch Nagios: Tools→Integrated Consoles→Nagios The Nagios main window shown in Figure 20-2 appears when you launch Nagios. Figure 20-2 Nagios main window From the Nagios main window, you can choose any of the menu options on the left navigation bar.
Hosts Services Host Groups Summary Grid Service Groups Summary Grid Problems Services (Unhandled) Hosts (Unhandled) Network Outages Reports Availability Trends Alerts History Summary Histogram Notifications Event Log HP Graph System Comments Downtime Process Info Performance Info Scheduling Queue Configuration NOTE: The term Hosts on the Nagios window refers to any object with an IP address, not just managed systems. Keep this in mind when using the Nagios application. 20.
Figure 20-3 Nagios tactical overview The top of the window provides information about the network. It provides the number of network outages and information on the network health in terms of the Nagios hosts and Nagios services. The next portion of the window contains information about the Nagios hosts. It reports the number of hosts that are down, unreachable, up, and pending. In Figure 20-3, two hosts are down.
The standard Nagios Tactical Overview display uses the color red to highlight ‘Disabled’ services. To illustrate, in Figure 20-3, in the Active Checks column, the message 21 Services Disabled is displayed on a red field. A disabled service is a configuration status, not an error condition. Insight Control for Linux takes advantage of the Nagios passive check feature to optimize and to minimize data collection and reporting across large numbers of managed systems.
Figure 20-5 Nagios service detail view The Status column displays any problems that might be occurring. To display the status of a service, select the link for the service in the Service column to open the Nagios Service Information view shown in Figure 20-6.
Figure 20-6 Nagios service information view 20.3.3 Displaying hosts and services that are experiencing problems The Service Problems view, which is accessed by selecting Problems Services (Unhandled) in the Nagios menu, is useful for configurations with hundreds of systems. It identifies the Nagios hosts that are experiencing problems, and it shows only the corresponding Nagios services with status that is not OK, which enables you to monitor only those Nagios hosts that need attention.
Figure 20-7 Nagios service problems view Select the link that corresponds to a Nagios host to open the Nagios Host Information view for that Nagios host. You can also use the Nagios report generator, nrg, to obtain an analysis of Nagios services: # nrg --mode analyze For more information and examples of its use, see nrg(8). 20.3.
Figure 20-8 HP Graph default overview display Figure 20-9 HP Graph detail display of managed systems If you want to display the graphical data for a selected Nagios host (a Nagios host can be a virtual host), select an item in the menu in the upper left-hand side. Figure 20-10 (page 183) shows the graphs for one managed system, osmone. The following menus and menu items control the information you can display for a managed system: • The Metric menu influences the information shown in the graphs.
Shows how much of the CPU time was spent on system-level tasks. cpu usage Reports how much of the server's CPU set was spent in the user, system, and nice states. This is the default view. load average Reports the 1, 5, and 15 minute load averages. mem buffers Shows how much of the server's memory is allocated to system-wide memory buffers. mem shared Reports the amount of memory shared among applications.
NOTE: hosts. The detail graphs for a system show the graphs for a specified metric on all Nagios The detail graphs for a Nagios host show metrics for that Nagios host. Figure 20-10 HP Graph host display for one managed system 20.3.5 Gathering and displaying system environment data Insight Control for Linux provides plug-ins that monitor the environment data on each managed system such as temperature and fan speed, which can be indicators of possible system failure.
Nagios plug-ins are located in the /opt/hptc/nagios/libexec directory on the CMS. Table 20-1 lists each Nagios plug-in service that runs on the CMS. The items in the Service Name column correspond to the Service column of the Nagios Service Detail View and Service Problems View windows, which are shown in Figure 20-4 (page 177) and Figure 20-7 (page 180), respectively.
Table 20-2 Services monitored on managed systems (continued) Service name Function/Description Syslog Alerts1 Links to any consolidated log messages that match patterns in the /opt/hptc/ nagios/etc/syslogAlertRules file. System Event Log1 Links to any System Event Log messages that match patterns in the /opt/hptc/ nagios/etc/selRules file. The System Event Log is collected through the management processor, either an iLO or an IPMI BMC.
3 4 5 6 1 Warning 2 Critical other Unknown The name of the Nagios service description. For more information, see the corresponding /opt/hptc/nagios/etc/templates/*_template.cfg template file. The alert applies to this host name. The IP address of the host. The message text generated from the plug-in. In the following example, indicates that the Nagios monitor running on icelx47 collected this data.
contact_name alias service_notification_period host_notification_period service_notification_options host_notification_options service_notification_commands host_notification_commands email pager } nagios Nagios Admin 24x7 24x7 w,u,c,r d,u,r notify-by-email,notify-by-epager host-notify-by-email,host-notify-by-epager nagios@localhost.localdomain nagios@localhost.localdomain Changing the values for email and pager to reflect the system name enables Nagios to send notification through the sendmail utility.
20.8.3 Performance Dashboard requirements The servers you want to monitor must fulfill the following requirements for using the Performance Dashboard tool; the servers must be: • • Licensed for Insight Control for Linux Configured to use Insight Control for Linux monitoring services, as described in Chapter 19 (page 165) 20.8.4 Interpreting the data in a ring plot Each segment, or slice, of a Performance Dashboard ring plot represents data for one managed server.
Figure 20-12 Monitoring three metrics using Performance Dashboard 20.8.4.1 Ring plot color coding The colors that the Performance Dashboard ring plot segments use represent the following: • • • Light Gray means that a server is actively reporting data. Pink represents the actual value of the metric. Dark Gray means that a server is not reporting data and might be down. In that case, select the Left Mouse on the server to launch the Nagios application focused on that server to investigate further. 20.8.4.
20.8.6 Using the mouse buttons to manipulate the Performance Dashboard tool Table 20-3 describes how to use the mouse to manipulate the Performance Dashboard tool. When you launch the tool and the default ring plot is displayed, use the following mouse actions to display server information, show data for different metrics, change the metric being reported, and launch Nagios on a specific server.
• • • • • • • • • • • • • • • • Total User Processes Total Zombie Processes Network Received MB Network Received Packets Network Received Dropped Packets Network Received Errors Network Transmitted MB Network Transmitted Packets Network Transmitted Dropped Packets Network Transmitted Errors Total Swap Swap In Use Pages In Pages Out Pages Swapped In Pages Swapped Out 20.8.8 Customizing the Performance Dashboard tool metrics The /opt/hptc/cmu/etc/sysconfig/ActionAndAlertsFile.
21 Using the command line to view managed system status Insight Control for Linux provides commands that you can run on the CMS to determine the status of managed systems. This chapter addresses the following topics: • • • • “Archiving sensor metrics on an individual basis” (page 193) “Displaying usage, statistics, and metrics with the shownode command” (page 194) “Displaying environmental data” (page 199) “Reporting usage information and host and service status” (page 199) 21.
Example 21-2 Expanded sensor metrics # shownode metrics sensors icelx1 Timestamp |Node_Id |Name |Value |Description -------------------------------------------------------------------------date_and_time |icelx1 |Temp 8 Memory |54 |Celsius; ok date_and_time |icelx1 |Temp 5 CPU |31 |Celsius; ok date_and_time |icelx1 |Temp 2 CPU |33 |Celsius; ok date_and_time |icelx1 |Temp 7 CPU 2 |30 |Celsius; ok date_and_time |icelx1 |Temp 1 System |40 |Celsius; ok date_and_time |icelx1 |Temp 6 CPU 2 |30 |Celsius; ok date_an
Admin: device: gateway: hwaddr: iftype: ifusage: interface_number: ipaddr: ipv6addr: mtu: name: netmask: port: switch: install_disk: is_blade: level: location: memory: n_sockets: node_number: power_setting_dts: power_setting_on: region: server_type: services: gather_data: hosts: provider_type: eth2 Admin 192.0.2.3 earth.example.com Unknown (edit /etc/snmp/snmpd.
-------------------------------------------------------------------------------icelx1 |192.0.2.1 |earth |earth.example.com |192.0.2.7 |ILO2 icelx2 |192.0.2.2 |neptune |neptune.example.com |Unknown |Unknown icelx3 |192.0.2.3 |saturn |saturn.example.com |192.0.2.8 |ILO2 icelx4 |192.0.2.4 |mercury |mercury.example.com |192.0.2.9 |ILO2 icelx5 |192.0.2.5 |192.0.2.5 |192.0.2.5 |Unknown |Unknown icelx6 |192.0.2.6 |pluto |pluto.example.com |192.0.2.
NOTES: • Metrics that return data based on time (for example, cpu idle time) might be inaccurate when collected from a virtual guest by Supermon and Nagios. To ensure accuracy in these time-based metrics, use the vCenter application for VMware ESX and VMware ESXi virtual guests and virt-manager for Xen virtual guests. • On ESX 3.5 systems, the output from the shownode metrics command for paging, diskinfo, and so on is displayed as 0 because these metrics are unavailable from the ESX 3.5 host.
date_and_time |icelx1 | | | | | | | |0 |1 |2 |3 |4 |5 |6 |7 |1854680 |3645524 |3118507 |2585423 |2700043 |3517129 |3684936 |3705238 |1 |295 |60 |1 |215 |107 |300 |221 |2308294 |2669669 |2828907 |2680954 |2259501 |2372128 |2303954 |2321097 |564006498 |530638968 |550132258 |561896064 |563186698 |561278491 |561833592 |562124531 |223267 |31391474 |12305645 |1266708 |295084 |1274550 |619651 |291321 |2258 |1 |21605 |13143 |0 |0 |0 |0 |48882 |97907 |36849 |1529 |2273 |1402 |1368 |1387 The shownode metric
21.3 Displaying environmental data Depending on the platform, certain tools might enable you to collect information specific to the platform. You can use the /sbin/hplog utility to display the following environment data: • • • Thermal sensor data Fan data Power data For more information, see hp-health(4) and hplog(8). 21.
# nrg --help --help --verbose - Report more details --log|l - logfile, default $statuslog --severity - default is all c - critical, w - warning, o - ok, u - unknown, p - pending --hosts - Only list hosts status --services - Only list service status --monitors - Only list monitor status --up - Only up nodes --down - Only down nodes --sort t,h,s - Sort by (t)ime, (h)ost, (s)ervice --sort - Summary mode only (as cwoup as in severity) --mode - Report mode: (f)ull, (s)ummary, (r)aw, (w)at
22 Connecting to a remote console This chapter addresses the following topics: • • • • • “Console management facility overview” (page 201) “How CMF works” (page 201) “Accessing a remote console” (page 201) “Serial connections on DL100 series servers” (page 202) “Enabling telnet access to iLO management processors” (page 203) 22.1 Console management facility overview The Console Management Facility (CMF) daemon, cmfd, collects and stores console output for all managed systems.
You can also use the following command, but be aware that it returns internal names: # shownode roles --role management_hub 2. Log in to the console with the console command. You can specify either the internal name or the host name. This example uses the internal name icelx16 instead of the host name mercury: $ console icelx16 Locating server for icelx16 Server for icelx16 is mercury.example.
22.5 Enabling telnet access to iLO management processors NOTE: The telnet utility is not available on G7 servers. IMPORTANT: The telnet protocol transmits the user name and password in clear text over the network to the iLO management processor. HP does not recommend using telnet if your environment is untrusted. By default, the cmfd connects to the management processor using the SSH protocol.
Part IV Other topics 205
23 Miscellaneous topics This chapter addresses the following topics: • • • • • • • • • • “Changing management processor credentials” (page 207) “Changing the default port for the repository web server” (page 207) “Increasing the number of servers that can be discovered concurrently” (page 208) “Changing the IP address of the CMS ” (page 208) “Uninstalling Insight Control for Linux” (page 208) “Determining the installed Insight Control for Linux version” (page 209) “Event logging overview” (page 209) “Chang
23.3 Increasing the number of servers that can be discovered concurrently When performing a bare-metal discovery on a set of servers, the maximum number of nodes that are discovered concurrently is 16. Perform the following steps to increase that number: 1. Edit the /opt/mx/icle/icle.properties file to add the following line: DISCOVERY_MAX_AT_ONCE=servers Where servers is an integer value representing the number of servers discovered concurrently. 2. Restart HP SIM. 23.
4. Change directory to the uninstall directory, and run the uninstall script to remove all Insight Control for Linux RPMs: # cd /opt/hp/icelx/config/uninstall # ./uninstall.sh 5. Remove the following Insight Control for Linux monitoring directories. If you have any files in these directories that you want to preserve, make sure you save a copy of the files before you remove them. # # # # rm rm rm rm -Rf -Rf -Rf -Rf /opt/hptc /hptc_cluster /var/hptc /opt/repository/boot/pxelinux.
23.7.2 The syslog-ng.conf rules file The syslog-ng.conf rules file defines the order of importance by which the log files are arranged. The /opt/hptc/syslog-ng/etc/syslog-ng/syslog-ng.conf file defines a series of rules for the syslogng_forward service on how to handle messages from its clients. The syslog-ng.conf file contains five types of rules: Options Defines generic information such as reconnection timeouts, FIFO size limits, and so on.
• The value of MAX_CONCUR_CHAINS variable in the /opt/mx/icle/icle.properties file. The default value is 64. The sum of the weight values for all the tasks running concurrently cannot exceed the value of the MAX_CONCUR_CHAINS variable. For example, if you wanted to run several Deploy Linux Image tasks (each of these tasks carries a weight of 6), the sum of the first ten tasks is 60. The eleventh task would exceed the value of the MAX_CONCUR_CHAINS variable.
host vm001 { hardware ethernet 00:16:3E:AB:CD:01; fixed-address 192.0.2.150; option host-name "vm001"; } host vm002 { hardware ethernet 00:16:3E:AB:CD:02; fixed-address 192.0.2.151; option host-name "vm002"; } } MAC addresses for some Xen virtual machines begin with the octets 00:16:3E; the final three octets are chosen arbitrarily. Likewise, MAC addresses for some KVM virtual machines begin with the octets 52:54:00; the final three octets are also chosen arbitrarily.
24 Advanced topics Topics include: • • • “Management Processor Credentials” (page 213) “Deploying WBEM provider components using Configure or Repair Agents task” (page 215) “Logging RAM disk connections and operations” (page 216) 24.
5. Select OK. 24.1.2.2 Discovering and setting up servers with virtual media deployment If your site uses the virtual media deployment features of Insight Control for Linux, perform these additional steps when you discover the management processors: 1. 2. 3. 4. 5. For the initial part of the process, create an account on the management processor being discovered that matches the default Insight Control for Linux MP credentials. Use the HP SIM discovery tool to discover the management processor.
When a new set of credentials is entered with the Configure →Management Processor→Credentials... task, Insight Control for Linux attempts to find a user with the same user name. If one is found, the user password is changed to match the new credential. If no match is found, then the new credentials are placed in slot 15, overwriting the credentials. For this reason, do not store credentials, other than those for Insight Control for Linux, in slots 15 and 16. 24.
• • kernel-source sblim-indication_helper For SLES 10 SP2 and SLES 10 SP3, the openwbem package must not be installed. All Xen virtual hosts must have the HP ProLiant Support Pack (PSP) installed. For information on deploying PSP, including dependent packages, see the Minimum requirements for Linux servers section at http://h20000.www2.hp.com/bc/docs/support/SupportManual/c00472061/c00472061.pdf. 24.
Part V Troubleshooting and support resources 217
25 Troubleshooting This chapter addresses the following topics: • • • • • • • • • • • • • • • • • • • • • • • • • • • “General troubleshooting topics” (page 219) “Alternative booting” (page 220) “Apache service does not start” (page 220) “Troubleshooting CMF problems” (page 221) “Troubleshooting configuration problems” (page 224) “Troubleshooting connection problems” (page 228) “Troubleshooting DHCP problems” (page 229) “Troubleshooting discovery problems” (page 231) “Troubleshooting firmware update proble
Problem See: Unable to get SSH credentials: SSH credentials for the specified server were not set or are missing Section 25.22 (page 255) SSH authentication failed Section 25.22 (page 255) Unable to create SSH connection: Connection refused Section 25.22 (page 255) Error retrieving BMC for server. Root cause: Could not determine the BMC associated Section 25.19 (page 250) with the server (x.x.x.x) in the database Unable to power off server: Error retrieving BMC for server.
Init: Unable to read server certificate from file /var/log/httpd/error_log To create a self-signed certificate, see Section 19.2 (page 165). 25.4 Troubleshooting CMF problems The following table describes possible causes of problems with the Console Management Facility (CMF) and provides actions to correct them. Cause/Symptom Corrective actions The CMF is not running Perform the following actions: • Verify that the cmfd daemon is running on the CMS: # /etc/init.
Cause/Symptom Corrective actions Debug the cmfd daemon List the cmfd daemon's debug mode options with the cmfd -h command, then run the cmfd daemon with the appropriate options. The output is logged in the /opt/hptc/cmf/logs/ cmfd.log file. For example, to see the connection attempts and output: # /etc/init.d/cmfd stop # /opt/hptc/cmf/sbin/cmfd -d 00000102 222 Console command cannot connect to console.
Cause/Symptom Corrective actions CMF encounters a fatal error. Examine the /opt/hptc/cmf/logs/cmfd.log file for any errors. Console output is not being collected for the managed Perform the appropriate action: system. • Ensure that the target system’s VSP (Virtual Serial Port) is properly configured in its management processor. • Follow these steps: 1. Ensure that the target system’s console is being redirected to ttyS0 (COM1) or ttyS1 (COM2).
25.5 Troubleshooting configuration problems The following table describes possible configuration problems and provides actions to correct them. Cause/Symptom Corrective actions Configure Insight Control for Linux management services fails Perform the appropriate action: • Verify that the task has indeed completed. The Task Results window may report completion although the operation might not yet be complete. Monitor the console to determine the result.
Cause/Symptom Corrective actions If you experience similar issues, follow these troubleshooting recommendations: • Verify that the /etc/hosts file is correct. For example, make sure the real host name is not equated to localhost and make sure there is only one real and valid entry for the host name and IP address. • Verify that the DNS configuration is correct.
Cause/Symptom Corrective actions Enclosures collection monitor will report a CRITICAL Locate the value for the command[encchk_all] status if the OA credentials have not been configured command definition in the /opt/hptc/nagios/etc/ properly nrpe_local.cfg file. Run the command associated with the command definition. For example: # /opt/hptc/supermon/bin/sensors --cp=enclosures --domain icelx[1-5]:enclosures 1206387637 The user could not be authenticated.
Cause/Symptom Corrective actions Configuring node to boot from network: Error: Unable Ensure that you discover the managed server and its to get MAC address from device control management processor. Make sure that they are associated with each other. During the managed system installation, the managed system's management processor's MAC address was not found. The OS installation tool attempts to set the management processor for sole PXE boot.
25.6 Troubleshooting connection problems The following table provides actions to correct a possible connection problem. Cause/Symptom Corrective Actions Cannot connect to network Perform the following actions: • Verify the network connection. • Examine the firewall. • Verify that HP SIM is operating properly. • Verify the following settings in the /opt/hptc/etc/ sysconfig/cmsserver.ini file: — The value of cmsServer should be the IP address of the CMS. — The value of cmsPort should be 50001.
25.7 Troubleshooting DHCP problems The following table describes possible causes of problems with Dynamic Host Configuration Protocol (DHCP) and provides actions to correct them. Cause/Symptom Corrective actions DHCP Process Not Running Perform the appropriate action: The DHCP server process is absent from the process list, • Verify that the /etc/dhcpd.conf service configuration file exists and that it is not empty. verified with the following command: • Verify that the /etc/dhcpd.
Cause/Symptom Corrective actions IP Addresses Are WRONG • Verify that there is only one DCHP server on the network. • Check that the CMS and managed servers are networked properly, that is, either using a dedicated management network or obtaining approval from your network administrator to provide DHCP on a network. • Temporarily disable the DHCP service on the CMS with the following command: The IP addresses assigned to the managed systems do not match the configuration of your DHCP server.
25.8 Troubleshooting discovery problems The following table describes possible causes of problems that may occur during the device discovery process and provides actions to correct them.
Cause/Symptom Corrective actions The Reset Server operation failed. Manually reboot the server. Previously discovered system does not bare-metal discover. Manually delete all the files in the /opt/repository/ boot/pxelinux.cfg directory that correspond to MAC addresses from the managed server and restart the bare-metal discovery.
Cause/Symptom Corrective actions MP password length issue can cause bare metal discovery to fail Verify that the default global management processor password has 8 or more characters, which the management processor requires.
25.10 Troubleshooting Insight Control for Linux repository problems The following table describes possible causes of problems with the Insight Control for Linux repository and provides actions to correct them. Cause/Symptom Corrective actions Selected Repository Item Is Missing Choice of actions: • Select a different item from the repository. • Restore the item to the repository: 1. Select Options→IC-Linux→Manage Repository. 2. Add the item. 3.
25.12 Troubleshooting licensing problems The following table describes possible causes of problems that might be encountered with licensing a managed system and provides actions to correct them. Cause/Symptom Corrective actions Target Node Shows “Not Licensed” In Tool Wizard Use the HP SIM Deploy→License Manager tool to assign Insight Control for Linux licenses to the target system.
Cause/Symptom Corrective Actions Configure→IC-Linux→Configure Management Services fails with wget error on managed system so Insight Control for Linux management agents cannot be installed on the managed system. Perform the following actions to verify the cause of the problem: 1. Log in to managed system as the root user. 2. Manually run the /opt/hptc/mgmt/bin/ install.sh script to view the wget failure.
The following table describes possible causes of problems with the HP Graph tool and provides actions to correct them. Cause/Symptom Corrective Actions Cannot Launch HP Graph After Upgrade Add a symbolic link of the hpcgraph.conf file to the web server's configuration directory and restart the web server as follows: HP Graph cannot launch on a CMS that was upgraded from an older release of Insight Control for Linux. For RHEL operating systems: 1.
The following table describes possible causes of problems with the Performance Dashboard tool and provides actions to correct them. Cause/Symptom Corrective Actions Performance Dashboard initiates without data. It displays all dark gray. Perform the appropriate action: • Restart the Performance Dashboard tool. • Determine if Nagios is collecting data. The Performance Dashboard tool uses the same metric gathering infrastructure as Nagios.
• • • • • • “Running Nagios plug-ins manually” (page 239) “Using the Nagios report generator analyze mode” (page 240) “Messages reported by Nagios” (page 240) “A check_nrpe error occurs during management agents installation” (page 242) “Nagios gather_all_data script reports check_nrpe errors ” (page 243) “Troubleshooting Nagios problems” (page 243) 25.14.1 Determining the status of the Nagios service Use the following command to determine if Nagios is running properly: # /etc/init.
25.14.4 Using the Nagios report generator analyze mode The Nagios Report Generator (nrg) command features an analyze mode that can help you determine the cause of a problem. It also offers information on the solution.
Service: Environment Status Information: Node sensor status A warning or critical message indicates that one or more monitored sensors reported that a threshold was exceeded. Correct the condition. Service: Load Average Status Information: Node Load Ave: x/y/z QueLen: n A warning or critical message indicates that load average thresholds for the specific managed system were exceeded. Thresholds can be set on a per-managed system, per-class, or per-system basis in the nagios_vars.ini file.
Reports the number of new records processed in the /hptc_cluster/adm/logs/ consolidated.log file. A warning or critical message occurs when there is insufficient time to process a huge volume of messages before the Nagios service_check_timeout period expires. Nagios examines the recent incoming consolidated log messages and issues a warning or critical message if the incoming rate since last interval exceeds a configured number of records. The default values are 2 for warnings and 20 for critical.
• If the output reports that vars.ini have been resynchronized for a managed system, verify that there is a self-signed certificate for the Apache service and that that service is running. For troubleshooting information on the Apache service, see Section 25.3 (page 220). 25.14.7 Nagios gather_all_data script reports check_nrpe errors These errors include socket timeouts and refused connections. The nrpe daemon is unable to configure the server because the check_nagios_vars script is unable to write vars.
Cause/Symptom Corrective Actions Nagios services report a non-OK status Remove the nagios_vars.db file: Under very rare circumstances, the Nagios cache might # rm /opt/hptc/nagios/etc/nagios_vars.db become unsynchronized. If this occurs, it is possible that some Nagios services do not operate correctly. Nagios services might report warning Nagios services might report WARNING - No sensor data is available or Data is stale.
Cause/Symptom Corrective actions Can no longer install older OS after upgrade Ensure that older operating systems were added manually to the /opt/mx/icle/SupportMatrix.xml file. For information, see the HP Insight Control for Linux Installation Guide. Kickstart / AutoYaST install completes but task in SIM Ensure that you removed the following two files: UI still shows that it is running • autoInstallComplete_jsp.class • autoInstallComplete_jsp.
Cause/Symptom Corrective actions cciss: fifo full error on ProLiant ML350 G6 console Install the cpq-cciss driver option from the latest supported PSP. This error appears on the console of a ProLiant ML350 G6 installed with RHEL 4 Update 7.
Cause/Symptom Corrective actions The proper files were not copied from the installation Copy the proper OS files into the appropriate /opt/ media into the appropriate /opt/repository/custom repository subdirectories. subdirectories The target server has lost association with its management processor. For the corrective action, see Section 25.20 (page 251) 25.15.3 Capturing Linux images Cause/Symptom Corrective actions The CMS disk partition with /opt/repository is full.
Cause/Symptom Corrective actions OS Installations fail, cannot connect to managed system See “OS Installations fail, cannot connect to managed system” in Section 25.15.1. Deployments hang Deployment might fail from one multiple partition scheme to another multiple partition scheme Capture the multi-partition image, deploy it to a single partition, capture that image, and then redeploy the image using the number of required partitions.
25.17 Troubleshooting PXE Boot problems The following table provides the actions to correct a possible PXE boot failure. Cause/Symptom Corrective Actions Updating NIC might result in temporary loss of PXE boot capability To reestablish PXE boot capability: 1. Press Enter to view the Option ROM message. 2. When the ServerEngine information is displayed, enter Ctrl-P to configure it. 3. Select the appropriate port. 4. Change the Boot Support setting to Enable. 5. Select Save. 6.
25.19 Troubleshooting server power control problems The following table describes possible causes of problems with powering up or down a managed system and provides actions to correct them. Cause/Symptom Corrective actions Error retrieving BMC for server. Root cause: Could not Perform the appropriate action: determine the BMC associated with the server (x.x.x.x) • Ensure that SNMP is configured correctly and that HP in the database SIM has access to SNMP on the target system.
Cause/Symptom Corrective actions Power cycle after HP SIM discovery starts bare metal discovery Powering on the server a second time causes it to boot normally, and this problem does not reoccur. A server that is set to PXE boot all the time might not boot properly after it is discovered with the Options→Discovery... tool. Instead of booting, the server undergoes a bare metal discovery process and powers down. This occurs only once per server.
25.20.3 Rebuilding a server-to-management processor association It might be necessary to rebuild an association between a previously discovered managed system and its management processor. There are several ways to do this depending on what the problem is and what state the managed system is in. 25.20.3.1 Repairing the association of a booted managed system running an OS If a managed server is booted and running a supported OS, follow these steps to repair a lost association.
servers, data fields in the system BIOS contain system serial number and asset tag information. These fields are set at the factory, but you can override them. Verify that these fields appear valid and do not contain any special characters. Abnormal data in these fields cause the iLO to generate an error and cause the server-to-iLO association to break. 6. • If the BIOS data is valid and the iLO XML call is still reporting errors, a hardware problem might be the cause.
7. 8. 9. Monitor the HP Insight Control user interface and the server's remote console and verify that the server was successfully rediscovered and associated with its management processor. If the server does not have a valid OS installed on it, power off the server when it has completed its discovery.
25.22 Troubleshooting SSH The following table describes possible causes of problems with the Secure Shell (SSH) credentials and delays and provides actions to correct them. Cause/Symptom Corrective actions SSH credentials missing for a server Set the SSH credentials for the target system using the HP SIM Options→Credentials tool. Specify Global or The SSH credentials for a server, required for running commands remotely, are missing. The cause may be one System credentials as appropriate.
Cause/Symptom Corrective actions SSH delays on SLES managed systems on networks without name resolution The following actions fix this issue: • Configure a DNS resolver on the network in question. You might encounter extended delays when trying to use • On the managed system, modify the /etc/hosts file SSH to access a SLES managed system on a network that to include both the CMS and the managed system.
25.23 Troubleshooting Supermon problems The following table describes possible causes of problems with Supermon and provides actions to correct them. Cause/Symptom Corrective actions Supermon is not running Perform the appropriate action: • Ensure that the supermon service is running on the CMS: # service supermon status If not, restart it: # service supermon restart • Ensure that the mond daemon is running on all the managed systems: # pdsh -a -x `headnode` /etc/init.
25.26 Troubleshooting virtual machine installation and setup problems Cause/Symptom Corrective action HP SIM did not properly identify the virtual host Perform the appropriate action: • For Xen , use the uname command as follows to verify that the VM host is running a Xen kernel: # uname -r 2.6.18-92.e15xen The text string xen should be embedded in the output.
Cause/Symptom Corrective action VM Guests are not monitored Check the {collection_name}_Servers subcollection to ensure that the VM guests to be monitored belong to that subcollection. Problems installing SLES 10 SP2 x86_64 Xen This might occur on some hardware combinations. See the HP Insight Control virtual machine management documentation for workarounds to this problem. Disk error or hang prompting for disk information occurs during ESX 3.
25.27 Troubleshooting virtual media problems Cause/Symptom Corrective action Server attempts to PXE boot or boot from local disk instead of booting using virtual media. Perform the following actions: • Verify that port 60002 is open on the CMS. • Run the Insight Control for Linux Configure→IC-Linux→Configure Boot Method task. Be sure to select Virtual Media for the boot method.
26 Support and other resources 26.1 Contacting HP 26.1.1 Information to collect before contacting HP Have the following information available before you contact HP: • • • • • • Software product name Hardware product model number Operating system type and version Applicable error message Third-party hardware or software Technical support registration number (if applicable) 26.1.
26.1.4 HP authorized resellers For the name of the nearest HP authorized reseller, see the following sources: • In the United States, see the HP U.S. service locator website at: http://www.hp.com/service_locator • In other locations, see the Contact HP worldwide website at: http://welcome.hp.com/country/us/en/wwcontact.html 26.1.5 Documentation feedback HP welcomes your feedback. To make comments and suggestions about product documentation, send a message to: docsfeedback@hp.
— • The chapter titled Customizing Nagios is now an appendix. The following information was removed: — Information on SLES 9, which is not supported in Version 6.2, was deleted. This includes an appendix titled Sample SLES version 9 installation media copy session. 26.3 Related information 26.3.
• HP Systems Insight Manager The Systems Insight Manager information library is available at the Systems Insight Manager product website: http://www.hp.com/go/foundationmgmt/docs • HP Insight Control Documentation for Insight Control, including documentation for Insight Control virtual machine management and Insight Control power management, is available from the HP Insight Control website: http://www.hp.
• http://www.novell.com/linux Home page for Novell, distributors of SUSE Linux Enterprise Server (SLES). • http://www.linux.org/docs/index.html The website for the Linux Documentation Project (LDP) contains guides that describe aspects of working with Linux, from creating your own Linux system from scratch to bash script writing. This site also includes links to Linux HowTo documents, frequently asked questions (FAQs), and manpages. • http://www.linuxheadquarters.
26.3.3 Troubleshooting resources The HP Insight Control for Linux Installation Guide and HP Insight Control for Linux User Guide each contain a chapter that describes troubleshooting hints and techniques. 26.4 Typographic conventions 266 Book title The title of a book. On the web, this can be a hyperlink to the book itself. Command A command name or command phrase, for example ls -a. Computer output Information displayed by the computer.
A Customizing Nagios The Nagios configuration is designed so that you can customize it as needed. Complete documentation for customizing Nagios is available on the following Nagios website: www.nagios.
# # # # # NRPE GROUP This determines the effective group that the NRPE daemon should run as. You can either supply a group name or a GID. NOTE: This option is ignored if NRPE is running under either inetd or xinetd nrpe_group=new_nagios_group Where new_nagios_group is the group name of the new Nagios user's account. Save the file. 5. Edit the /opt/hptc/nagios/etc/nagios.
# NAGIOS GROUP # This determines the effective group that Nagios should run as. # You can either supply a group name or a GID. nagios_group=new_nagios_group Save the file. 9. Run the Options→IC-Linux→Configure Management Services task. NOTE: The Task Results window may report completion although the operation might not yet be complete. Monitor the console to determine the result. 10. If your system has multiple management hubs, log into each management hub server and repeat steps 2 through 8. 11.
To avoid these alerts, use the command sequence listed in the following table to shut down Nagios before performing any maintenance operations and tasks and start or restart Nagios. Purpose Command line To shut down Nagios on the CMS immediately before performing maintenance operations and tasks: # /etc/init.d/nagios stop To start Nagios after a maintenance operation: # /etc/init.d/nagios start To restart Nagios after changing its configuration: # /etc/init.d/nagios restart A.2.
thresholds and generates alerts when a threshold is reached. Depending on your specific site configuration and use, some default thresholds might not be appropriate for your system. The platform-dependent default thresholds serve as a baseline, but they might not be optimal for your site. Determine the threshold values appropriate for your site and customize the Nagios configuration accordingly. The /opt/hptc/nagios/etc/nagios_vars.
Table A-1 Supermon metrics collection intervals (continued) Metric name Collection interval swapinfo default* time default* switch default* cputotal default* avenrun %LOADAVECOLLECTIONPERIOD% ** mdadm %MDADMCOLLECTIONPERIOD% ** * The default is 5 minutes. ** This value is specified in the /opt/hptc/nagios/etc/nagios_vars.ini file. A.2.5.1 Global service check timeout limit The master Nagios configuration file, nagios.cfg, contains global settings that control overall behavior.
Retry Check Interval Indicates the amount of time Nagios waits before retrying after a failure.
A.5 Controlling Nagios messages Nan is an open source utility supplement to the Nagios application. Insight Control for Linux incorporated the Nan notification aggregator and delimiter for the Nagios paging system. Nagios can send large numbers of messages, especially when the CMS and managed systems are starting up, shutting down, or experiencing a failure.
Glossary A AutoYaST file A configuration file used to effect an unattended SLES operating system installation. B bare-metal Describes a server that is not booted with a running operating system. This could be a brand new server with no OS installed on it, or it could be a server with an OS that is not booted. C central management server See CMS. certificate An electronic document that contains a subject's public key and identifying information about the subject.
hypervisor Computer software, specific to a hardware platform, that allows you to run multiple operating systems on a single host at the same time. I iLO Integrated Lights Out. A self-contained hardware technology available on various hardware models that enables remote management of any node within a system. Subsequent generations of this technology are iLO 2 and iLO 3. For information on which servers offer iLO management processors, see the HP Insight Control for Linux Support Matrix.
ProLiant Support Pack See PSP. PSP ProLiant Support Pack. HP software components that are bundled together and verified to work with a particular operating system. An HP ProLiant Support Pack contains driver components, agent components, and application and utility components. All these are verified to install together. PSP dependency script An optional user-provided script that runs during a PSP deployment to a managed system. PXE Preboot Execution Environment.
A Apache self-signed certificate, 220 configuring on the CMS, 165 Apache service does not start, 220 association between server and management processor, 251 between virtual host and virtual guest, 133 AutoYaST file, 92 (see also installation configuration file) defined, 91 B bare metal discovery iLO to server association lost, 254 power cycle starts , 251 starts after power cycle, 251 bare-metal system discovering (PXE), 75, 76 discovering (virtual media), 76 bare-metal system discovery discovery, 15 incr
digital signing, 27 directories to back up, 22 discover bare-metal systems, 15 bare-metal systems using PXE, 75 bare-metal systems using virtual media, 76 enclosures, 75, 79 servers with supported OS on them, 78 servers with unsupported OS on them, 75 switches, 75, 79 systems, 75 discovery iLO to server association lost, 254 power cycle starts bare metal discovery, 251 documentation ESX, 265 ESXi, 265 HP Insight Control, 264 Insight Control for Linux, 263 KVM, 265 Linux, 264 Nagios, 265 pdsh, 265 ProLiant s
Insight Control for Linux troubleshooting, 234 install PSP troubleshooting, 248 installation custom or other OS, 92, 105 interactive, 91 Linux variant, 105 prerequisites, 97 procedure to install a Linux OS, 106 procedure to install a VMware ESX using a Kickstart file, 102 procedure to install VMware ESX interactively, 103 procedure to install VMware ESXi interactively, 103 PSP, 143 Red Hat interactive, 99 Red Hat unattended, 99 SLES Linux interactive, 101 SLES Linux unattended, 101 supported operating syste
enabling telnet on, 203 enabling virtual media, 66 iLO, 172 IPMI, 172 lost association to server, 251 obtaining status of, 251 setting user name and password, 15 memory, 198 menu items, 14 metrics collection interval, 271 mond management agent, 172 monitoring environmental data, 183 hosts and services, 177 hosts and services with problems, 179 network bandwith, 180 real time metrics, 187 services failure, 224 strategy, 171 troubleshooting, 235 using Nagios, 175 using the command line, 193 monitoring service
installing on managed systems, 83 supported, 92 OS deployment troubleshooting, 244 OS installation troubleshooting, 244 defined, 144 location in repository, 144 simple example, 144 PXE boot, 19 troubleshooting, 249 P R paging and swap data, 198 partition wizard, 120–122 password file for Nagios, 274 pdsh documentation, 265 pdsh command, 18, 152 Performance Dashboard tool, 171, 187 customizing, 191 metric menu stops working, 238 metrics, 190 troubleshooting, 237 plug-in check_metrics, 173 defined, 172 di
documentation, 265 run script troubleshooting, 249 run ssh command troubleshooting, 249 S SAID agreement, 261 scalable deployment preparing for, 115 selecting, 119 scalable task results format, 40 Secure Shell, 25 security, 25 sendmail utility, 186 sensor data not reported, 257 sensor metrics archiving, 193 sensor thresholds changing for Nagios, 270 serial console access and logging configuring, 88 server control troubleshooting, 250 server power management, 149 _Server subcollection, 157 server to managem
common areas, 33 controlling view options, 34 log button, 39 operation control buttons, 42 operation details log, 35 operation target details table, 42 operations table, 41 parameters button, 35 rerun non complete targets, 35 scalable task results format, 40 SIM standard task results format, 36 stop button, 38 target details table, 39 target status area, 37 view printable report button, 34 task status, 31 tasks changing number of concurrent, 210 technical support service, 261 telnet enabling, 203 toolboxes,
HP technical support, 261 Linux vendors, 264 ProLiant servers, 264 ProLiant Support Pack, 264 white papers Insight Control for Linux, 263 X Xen guidelines for configuring virtual guest, 132 required BIOS setting, 125 286