scsimgr SCSI Management and Diagnostics utility on HP-UX 11i v3 Abstract.............................................................................................................................................. 2 Terms and definitions ........................................................................................................................... 2 Introduction......................................................................................................................................
Abstract This paper presents the scsimgr command introduced with HP-UX 11i v3 to provide SCSI management and diagnostics. scsimgr significantly enhances management and troubleshooting capabilities of the mass storage subsystem. This paper provides an overview of its features, its general syntax and examples showing how to use the command to accomplish some specific tasks. Terms and definitions Agile Addressing The ability to address a LUN with the same DSF regardless of the location of the LUN, i.e.
Introduction This paper presents the scsimgr command introduced with HP-UX 11i v3 (B.11.31) to provide SCSI management and diagnostics. It is intended for system administrators and other users of HP-UX 11i v3 (B.11.31) mass storage subsystem. The reader is assumed to have some basic knowledge of HP-UX mass storage subsystem configuration and troubleshooting.
Main features and benefits The main features and benefits of the scsimgr command include: • Basic management and diagnostic capabilities for all SCSI objects (LUNs, lunpaths, target paths and SCSI HBA controllers) independently of the drivers managing them. • Plug-ins to handle operations dependent on device class (block, tape, changers, and so forth.) or SCSI transport type (Fibre Channel (FC), Serial attached SCSI (SAS), parallel SCSI (pSCSI), and so forth).
General syntax and summary of options The syntax of the scsimgr command is as follow: scsimgr [-fpv] command [-d driver] [identifier] [keyword ...] [argument ...] scsimgr [-h] [-d driver] [command] The following tables summarize the meaning of different components Option -v -p -f -h -d Description Verbose or extended output of command options Note: This option is ignored for command options which do not provide a verbose or extended output. Parsable or scriptable output.
Command keyword erase inquiry ddr_add ddr_del ddr_list ddr_name activate Description Erase optical memory device surface Perform the INQUIRY command to retrieve standard inquiry data or vital product data Add a settable attribute scope Delete a settable attribute scope List registered settable attribute scopes Generate a settable attribute scope covering a SCSI object Activate lunpaths in standby state or active non-optimized state for devices supporting asymmetric access identifier: specifies the object
The different components of the settable attribute scope should be specified exactly as they would be returned by the device in the inquiry data. For instance to specify a scope corresponding to all disk devices from HP bound to the esdisk driver, the attribute scope should be specified as: “/escsi/esdisk/0x0/HP “. Note how space characters are added to make sure that the vendor identifier corresponds to what is returned in the standard inquiry data.
Examples of using scsimgr The following scsimgr examples show how to use some of the command important features. Gathering status information and statistics The scsimgr command provides an extensive set of status information, statistics and attributes for SCSI subsystems (class drivers, interface drivers) and objects (LUNs, lunpaths, target paths and SCSI HBA controllers). This simplifies the monitoring of the mass storage activity and enables to quickly root cause problems.
IO transfer timeout in secs FORMAT command timeout in secs START UNIT command timeout in secs Timeout in secs before starting failing IO IO infinite retries = = = = = 30 86400 60 240 true Status information for a SCSI tape device identified by its device file name # scsimgr get_info -D /dev/rtape/tape2_BEST STATUS INFORMATION FOR LUN : /dev/rtape/tape2_BEST Generic Status Information SCSI services internal state Device type EVPD page 0x83 description code EVPD page 0x83 description association EVPD page
EVPD page 0x83 description type World Wide Identifier (WWID) Outstanding I/Os Maximum I/O timeout in seconds Maximum I/O size allowed Maximum number of active I/Os allowed Current active I/Os Maximum queue depth Queue full delay count = = = = = = = = = 3 0x20000020371972eb 0 30 2097152 16 0 16 0 Status information for a target path identified by its hardware path # scsimgr get_info -H 0/4/1/0.0x21000020371972eb STATUS INFORMATION FOR TARGET PATH : 0/4/1/0.
Notes: • Not all the statistics actually displayed by scsimgr are shown. • When the ‘-v’ option is specified, an extended list of statistics is displayed for the object.
Statistics for lunpaths Similar to LUNs, lunpaths statistics can be divided into 3 categories: generic statistics, I/O transfer statistics and class driver specific statistics. Class driver specific statistics depend on the class of the LUN and require a plug-in module to be displayed.
Generic Statistics: CB_SCAN_ALL events received Target Probe events received Probe failures due to LUN 0 probe failures Probe failures due to REPORT LUNS failures LUN path probe failures Target path offline events from I/F driver Target path online events from I/F driver Port id change events from I/F driver Target Warm Reset events Target Cold Reset events Target Warm reset failures Target Cold Reset failures Invalid port id changes Total I/Os processed Last time cleared = = = = = = = = = = = = = = = 647
LUN path : lunpath5 Class Instance Hardware path SCSI transport protocol State Last Open or Close state = = = = = = lunpath 5 0/3/1/0.0x21000020371972eb.0x0 fibre_channel UNOPEN ACTIVE LUN path : lunpath12 Class Instance Hardware path SCSI transport protocol State Last Open or Close state = = = = = = lunpath 12 0/4/1/0.0x21000020371972eb.
Displaying hardware path, device file, WWID and serial number for all devices, in scriptable output # scsimgr -p get_attr all_lun -a hw_path -a device_file -a wwid -a serial_number 64000/0xfa00/0x0:/dev/rdisk/disk15:0x0004cffffebbf737:3FD1M2SW 64000/0xfa00/0x1:/dev/rdisk/disk16:0x0004cffffebbe43e:3FD1LZ0S 64000/0xfa00/0x2:/dev/rdisk/disk17:: 64000/0xfa00/0x3:/dev/rdisk/disk18:0x20000020371972ee:LS255190000010080N0V 64000/0xfa00/0x4:/dev/rdisk/disk19:0x20000020371972e3:LS271211000010080N1H 64000/0xfa00/0x5:/
• default value: the value used to initialize the attribute at system boot when there is no saved value, or upon user request to reset the attribute. The default value is usually hard-coded by the driver or is inherited from the current value at a higher level where the attribute is explicitly set. Note: For read-only attributes (also called non settable attributes), only the current value is defined.
The following table describes when the change of an attribute value is taken into account for operations, depending on the level where the attribute is changed, and the scope where the attribute can be set. Levels where the attribute is settable Level where the attribute value is changed Object instance Global Global only Global and instance only N/A Immediately(1) object Immediately(1) Upon re-initialization of the object instance. For example LUN open(2).
# scsimgr get_attr -a leg_mpath_enable SCSI GLOBAL ATTRIBUTES: name = leg_mpath_enable current = true default = true saved = # scsimgr get_attr -D /dev/rdisk/disk100 -a state -a leg_mpath_enable SCSI ATTRIBUTES FOR LUN : /dev/rdisk/disk100 name = state current = UNOPEN default = saved = name = leg_mpath_enable current = true default = true saved = # scsimgr get_attr -D /dev/rdisk/disk101 -a state -a leg_mpath_enable SCSI ATTRIBUTES FOR LUN : /dev/rdisk/disk101 name = state current = UNOPEN default = saved =
For disk100 and disk101, the current value of leg_mpath_enable is still ‘true’, while the default value is now ‘false’. This conforms to the rule for updating current values of attributes. The current value is updated only when the device is opened. The default value is derived from the current value set at the global level. You can open disk100 by running the following command.
current = true default = true saved = # scsimgr get_attr –D /dev/rdisk/disk100 -a leg_mpath_enable SCSI ATTRIBUTES FOR LUN : /dev/rdisk/disk100 name = leg_mpath_enable current = true default = true saved = # scsimgr get_attr –D /dev/rdisk/disk101 -a leg_mpath_enable SCSI ATTRIBUTES FOR LUN : /dev/rdisk/disk101 name = leg_mpath_enable current = true default = true saved = You can set leg_mpath_enable to be persistently true for disk100.
SCSI ATTRIBUTES FOR LUN : /dev/rdisk/disk101 name = leg_mpath_enable current = false default = false saved = name = state current = ONLINE default = saved = Because the user has explicitly set it to true, the current value of leg_mpath_enable for disk100 is no longer inherited from the global level. It is inherited from the global level for disk101 because it has not been explicitly set for disk101.
default = 30 saved = disk15 and disk16 inherit current and default values from the “/escsi/esdisk” settable attribute scope, which covers all devices bound to the esdisk class driver. You can change the value of esd_secs at “/escsi/esdisk” level to 60 and check that the new value is updated for disk15 and disk16 without having to open these devices.
These tunables need to be fine tuned to meet the following requirements for an optimal operation: • All HP disk devices with the product identifier (pid) ST3360LCB, require path_fail_secs to be set at least to 160 seconds. In addition these disk arrays exhibit some degradation in performance if I/O requests are distributed in a round robin manner. Optimal performances are obtained with the least_cmd_load policy, which distributes I/O to the I/O path with the least load.
# scsimgr save_attr -N "/escsi/esdisk/0x0/HP /ST3360LCB path_fail_secss=160 -a load_bal_policy=least_cmd_load Value of attribute path_fail_secs saved successfully Value of attribute load_bal_policy saved successfully # scsimgr get_attr -N "/escsi/esdisk/0x0/HP path_fail_secss -a load_bal_policy /ST3360LCB " -a " -a SCSI ATTRIBUTES FOR SETTABLE ATTRIBUTE SCOPE : /escsi/esdisk/0x0/HP /ST39103FC2 name = path_fail_secs current = 160 default = 120 saved = 160 name = load_bal_policy current = least_cmd_load d
The path_fail_secs attribute, is explicitly set for disk0. This is reflected in the current value of this attribute, which correspond to what has been set through scsimgr. - disk0 does not match any of the 2 settable attribute scopes added. load_bal_policy and max_q_depth attributes inherit their default values from the global level. disk1: - Default values of the path_fail_secs and load_bal_policy attributes are inherited from the settable attribute scope covering devices with pid: ST3360LCB.
default = 4 saved = name = load_bal_policy current = round_robin default = round_robin saved = The scsimgr command output below shows the values of the tunables after the disks are opened or re-opened - In other words after the attributes are reloaded. Note the following: • Now current values of the attributes are set based on the inheritance rules. Most importantly these values correspond to the operational requirements outline earlier. • disk0: - The path_fail_secs attribute, is explicitly set for disk0.
saved = name = max_q_depth current = 8 default = 8 saved = name = load_bal_policy current = least_cmd_load default = least_cmd_load saved = # scsimgr get_attr -D /dev/rdisk/disk2 -a max_q_depth -a load_bal_policy SCSI ATTRIBUTES FOR LUN : /dev/rdisk/disk2 name = path_fail_secs current = 120 default = 120 saved = name = max_q_depth current = 4 default = 4 saved = name = load_bal_policy current = round_robin default = round_robin saved = Assigning user-friendly device identifiers and aliases for ease of inve
# scsimgr set_devid -D /dev/rdisk/disk20 "Engineering department disk20" Do you really want to set device id? (y/n)? y scsimgr: ERROR: LUN /dev/rdisk/disk20 does not support Device Identifier # scsimgr get_devid -D /dev/rdisk/disk20 scsimgr: ERROR: LUN /dev/rdisk/disk20 does not support Device Identifier To assign the alias “Engineering - XPD08934-1” to disk device disk1 # scsimgr save_attr -D /dev/rdisk/disk1 -a alias="Engineering - XPD08934-1" Value of attribute alias saved successfully To view the alia
After reconfiguration Initial Configuration Server S Server S 1 1 HBA: 0/1/1/0 HBA: 0/1/1/0 2 2 Target id: 0 Lunid: 0 Target id: 0 Lunid: 0 WWID: 0x0004cffffebbe43e WWID: 0x0004cffffebbf737 The above diagram shows a disk connected to Server S via parallel SCSI and through the HBA with hardware path: 0/1/1/0. This disk is assigned the instance 15 by the system. Its target identifier is: 0x0, its LUN identifier is: 0x0, and its WWID is: 0x0004cffffebbf737. The disk has one lunpath: 0/1/1/0.0x0.
0/1/1/0.0x0.0x0 /dev/disk/disk15 /dev/rdisk/disk15 disk15 WWID: # scsimgr -p get_attr -D /dev/rdisk/disk15 -a wwid 0x0004cffffebbf737 The system administrator replaces disk15 with a back-up disk containing the same data, but with a different WWID (0x0004cffffebbe43e). The target id and LUN id remain the same. As the new disk is accessed through the same HBA and the target identifier and LUN identifier remain the same, its lunpath hardware path remains: 0/1/1/0.0x0.0x0.
Binding of LUN path 0/1/1/0.0x1.0x0 with new LUN validated successfully The advantage of this method is that it directly specifies the lunpath class and instance provided in the logging message. This is also sufficient when the replaced disk only has one lunpath, or if only a few lunpaths are affected by the SAN reconfiguration.
be put in “Authentication failure”. In this case it is more convenient for the user to validate the change by specifying the target paths. All lunpaths beneath each target path specified will be validated. Since the number of target paths is very limited compared to the number of lunpaths or disks affected, the system administrator will have to run ‘scsimgr replace_wwid’ only a few times. The system administrator can use target information provided in the logging as shown in the example below.
legacy DSF as they would now access a different LUN. For instance the user may have to reconfigure a volume group to use a different legacy DSF corresponding to another path to the original LUN. Example of a SAN reconfiguration affecting legacy DSFs The following is an example of a SAN reconfiguration, whose impact on legacy DSFs requires explicit confirmation of the change of the legacy DSF binding.
• • The hardware paths of the 2 target ports of DSA1 are: 0/4/1/0/4/0.0x50001fe150006e69 and 0/4/1/0/4/0.0x50001fe150006e68. The corresponding legacy target hardware paths will start with: 0/4/1/0/4/0.1.2 and 0/4/1/0/4/0.1.4. The following output of the ioscan command shows the target paths of disk array DA1: # ioscan -kfNCtgtpath Class I H/W Path Driver S/W State H/W Type Description ====================================================================== tgtpath 5 0/4/1/0/4/0.
The instance numbers of FCP array interfaces (virtual buses) in the legacy view, corresponding to the two target ports of the disk array DA1 are shown in the output of the ioscan command below. These instance numbers can be correlated with the legacy DSFs. # ioscan -kfnCext_bus -H 0/4/1/0/4/0 Class I H/W Path Driver S/W State H/W Type ===================================================================== ext_bus 15 0/4/1/0/4/0.1.2.0.0 fcd_vbus CLAIMED INTERFACE ext_bus 14 0/4/1/0/4/0.1.2.255.
disk 93 disk 94 disk 95 64000/0xfa00/0xd esdisk /dev/disk/disk93 64000/0xfa00/0xe esdisk /dev/disk/disk94 64000/0xfa00/0xf esdisk /dev/disk/disk95 CLAIMED DEVICE /dev/rdisk/disk93 CLAIMED DEVICE /dev/rdisk/disk94 CLAIMED DEVICE /dev/rdisk/disk95 COMPAQ HSV111 (C)COMPAQ COMPAQ HSV111 (C)COMPAQ COMPAQ HSV111 (C)COMPAQ The following output of the ioscan command shows the new target path (0/4/1/0/4/0.0x50002fe250006e6c) corresponding to target port ‘7’ of DA2.
The SCSI stack detects the change of the disk seen through the legacy lunpaths (legacy DSFs) with hardware path starting with: 0/4/1/0/4/0.1.4. It logs the following message for the legacy lunpaths impacted; One message for each legacy lunpath. The legacy lun path (b 17 - t 0 - l 1) registration failed because it has been re-mapped from its original LUN (default dev 0x0C000005) to a different LUN (default dev 0xC0000006).
# dd if=/dev/rdsk/c17t0d1 of=/dev/null count=100 100+0 records in 100+0 records out Validating the change of LUN binding for a single legacy DSF You can choose to immediately validate the binding of some of the legacy DSFs with new LUNs and deferred it for others for example to let applications using these legacy DSFs complete. In this case, run the ‘scsimgr replace_leg_dsf’ command for each legacy DSF for which you want to validate the change in binding.
Legacy lun path (default minor = 0x110100) mapping to LUN (default minor = 0x6) could not be cleared in the context of rmsf on new LUN path (0/4/1/0/4/0.0x50002fe250006e6c.0x4001000000000000). Close the legacy LUN path before running ioscan to re-map the legacy lun path to a new LUN. After the change is validated, the impacted legacy DSFs are bound to the disks of the disk array DA2 as shown in the following ioscan output.
scsimgr: LUN /dev/rdisk/disk100 disabled successfully # ioscan -P health /dev/rdisk/disk100 Class I H/W Path health =============================== disk 100 64000/0xfa00/0x15 disabled Notes: • When all lunpaths to a LUN are disabled, applications can no longer open the LUN, and attempts to open the LUN will fail with the following error: ‘no such device or address’. • The system performs critical resource analysis (CRA) when the user requests to disable a lunpath or a LUN.
• Vendor specific active-passive devices with a plug-in module provided on HP-UX 11i v3. Determining whether a device supports explicit asymmetric access To determine if a device supports explicit asymmetric access, run the ‘scsimgr get_info’ on the device. The following table indicates whether the ‘scsimgr activate’ command applies based on the values of fields ‘LUN access type’ and ‘Asymmetric logical unit access supported’.
Standby LUN paths Failed LUN paths Maximum I/O size allowed Preferred I/O size Outstanding I/Os I/O load balance policy Path fail threshold time period Transient time period Tracing buffer size LUN Path used when policy is path_lockdown LUN access type Asymmetric logical unit access supported Asymmetric states supported Preferred paths reported by device = = = = = = = = = = = = = = 2 0 2097152 2097152 0 round_robin 0 60 1024 NA T10 Asymmetric Active-Active Both implicit and explicit ao_sup, an_sup Yes Dr
Preferred paths reported by device Preferred LUN paths = No = 0 Driver esdisk Status Information : Capacity in number of blocks Block size in bytes Number of active IOs Special properties Maximum number of IO retries IO transfer timeout in secs FORMAT command timeout in secs START UNIT command timeout in secs Timeout in secs before starting failing IO IO infinite retries = = = = = = = = = = 2097152 512 0 45 30 86400 60 45 true Example of condition where to run ‘scsimgr activate’ command The following o
Generic Status Information SCSI services internal state Open close state Protocol EVPD page 0x83 description code EVPD page 0x83 description association EVPD page 0x83 description type World Wide Identifier (WWID) Total number of Outstanding I/Os Maximum I/O timeout in seconds Maximum I/O size allowed Maximum number of active I/Os allowed Maximum queue depth Queue full delay count Asymmetric state Device preferred path Relative target port identifier Target port group identifier = = = = = = = = = = = = = =
In this example, lunpath45 goes offline. Run ‘scsimgr activate’ on disk54 to activate lunpath8 and lunpath45.
EVPD page 0x83 description association EVPD page 0x83 description type World Wide Identifier (WWID) Total number of Outstanding I/Os Maximum I/O timeout in seconds Maximum I/O size allowed Maximum number of active I/Os allowed Maximum queue depth Queue full delay count Asymmetric state Device preferred path Relative target port identifier Target port group identifier = = = = = = = = = = = = = 0 3 0x600508b300903330ec569c5839ab003c 0 30 2097152 8 8 0 ACTIVE/OPTIMIZED No 2 2 STATUS INFORMATION FOR LUN PATH
• The ‘scsimgr activate’ command does not need to be invoked on multiple systems connected to the storage as invoking on one host is sufficient. Please refer to any devicespecific best practices for more details. • The ‘scsimgr activate’ command can be used to restore the preferred paths to the devices. Please refer to any device-specific best practices for more information. • The ‘scsimgr activate’ command may fail if the SCSI stack is performing internal configurations on the disk device.