HP StorageWorks Enterprise Virtual Array online firmware upgrade best practices Part number: 5697–6388 Second edition: January 2007
Legal and notice information © Copyright 2006-2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.
Contents About this guide . . . . . . . . . . . . . . . . . . . . . . . . . . Intended audience . . . . . . . . Related documentation . . . . . . Document conventions and symbols HP technical support . . . . . . . Documentation feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Baseline test . . . Light load testing . Heavy load testing . Test results . . . . . . Light load testing . Heavy load testing . Conclusion . . . . . . Performance metrics . . Exchange 2003 . . Microsoft SQL Server Oracle 10g . . . . Application Servers Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2005 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Figures 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 ..Determining the maximum I/O throughput (MB/s) load for EVA4000/6000/8000 ..Determining the maximum I/Os per second (IOPS) load for EVA4000/6000/8000 ..Determining the maximum I/O throughput (MB/s) load for EVA3000/5000 . . . ..Determining the maximum I/Os per second (IOPS) load for EVA3000/5000 . . . ..Sample host port statistics . . . . . . . . . . . . . . . . . . . . . . . . . ..Sample virtual disk statistics . . . . . . . . . . . . . . .
Tables 1 2 3 4 6 ..Document conventions . . . . . ..Suspending replication during an ..IBM AIX timeout settings . . . . ..Application/Database layout . . . . . online . . . . . . . . . . upgrade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 . 12 . 16 .
About this guide This guide includes the following information: • Understanding how the EVA online firmware upgrade process works. • Assessing your SAN environment to determine if it can support an online upgrade. • Determining how long the storage system will be unavailable during the online upgrade process. This time is required to perform the controller resynchronization. • Scheduling the best time to perform the upgrade.
Document conventions and symbols Table 1 Document conventions Convention Element Blue text: Table 1 Cross-reference links and e-mail addresses Blue, underlined text: http://www.hp.
HP technical support Telephone numbers for worldwide technical support are listed on the HP support web site: http://www.hp.com/support/. Collect the following information before calling: • • • • • • Technical support registration number (if applicable) Product serial numbers Product model names and numbers Error messages Operating system type and revision level Detailed questions For continuous quality improvement, calls may be recorded or monitored. Documentation feedback HP welcomes your feedback.
About this guide
1 Understanding the online firmware upgrade process During the life of an EVA storage system, it will be necessary to periodically upgrade the system firmware. Upgrading to the most current firmware release ensures that the storage system benefits from ongoing design improvements and enhancements. Performing the firmware upgrade online minimizes the impact on the hosts and applications accessing the storage system. It is not necessary to halt I/O or suspend operation of the hosts.
Online upgrades in HP Continuous Access EVA environments The following recommendations apply when performing an online upgrade in HP Continuous Access EVA environments: • The HP Command View EVA management server is often connected on the fabric between the source and destination storage systems in such a way that one storage system is considered to be local to the management server. That is, management commands do not pass across the intersite link (ISL).
2 Evaluating the SAN for an online upgrade Before performing an online upgrade, the SAN environment must be evaluated to determine system timeouts, identify I/O patterns, and select the best time to perform the upgrade. This will help ensure the success of the online upgrade. Managing host I/O timeouts The default values for host operating parameters such as LUN timeout and queue depth are typically set to values that ensure proper operation with the storage system.
Default timeout values • Sdisk timeout: 30 seconds • (LVM) lvol timeout: 0 seconds Checking timeout values • Device within a LVM volume group: pvdisplay /dev/dsk/cxtxdx • lvol physical volume with an LVM group: lvdisplay /dev/vgxx/lvolx Changing timeout values Pvchange -t (seconds) /dev/dsk/cxtxdx Linux The following configuration recommendations apply to both RedHat and SUSE.
Solaris Solaris supports online controller upgrades with the following driver timeouts. • Sun Drivers(qlc or emlxs): 60 seconds • QLogic (qla2300): 60 seconds • Emulex (lpfc): 60 seconds Checking or changing timeouts For Sun drivers, add the following lines to /etc/system file: set sd:sd_io_time = 60 set ssd:ssd_io_time = 60 For QLogic, edit the /kernel/drv/qla2300.conf file and change the hbax-link-down-timeout value to 60 as follows: hba0-link-down-timeout=60; For Emulex, edit the /kernel/drv/lpfc.
IBM AIX Checking or changing timeouts AIX requires the disk settings shown in Table 3 for the native multipath drives. Table 3 IBM AIX timeout settings Setting Value Description PR_key_value NA Sets the key value for persistent reservations. Persistent reservations are not supported. Algorithm fail_over Sets the load balancing algorithm to fail_over. All I/O uses a single path. The remaining paths are in standby mode. The value round_robin is not supported.
Assessing the impact of I/O load on the upgrade Because the online firmware upgrade is performed while host I/Os are being serviced, the I/O load can impact the upgrade process. In general, performing the upgrade during a period of low I/O activity will help ensure the success of the upgrade. There are three primary ways in which I/O load impacts the online upgrade process. • First is the ability of the storage system to perform the upgrade in parallel with host I/O.
Figure 1 Determining the maximum I/O throughput (MB/s) load for EVA4000/6000/8000 18 Evaluating the SAN for an online upgrade
Figure 2 Determining the maximum I/Os per second (IOPS) load for EVA4000/6000/8000 HP StorageWorks Enterprise Virtual Array online firmware upgrade best practices 19
Figure 3 Determining the maximum I/O throughput (MB/s) load for EVA3000/5000 20 Evaluating the SAN for an online upgrade
Figure 4 Determining the maximum I/Os per second (IOPS) load for EVA3000/5000 Analyzing storage system utilization using HP Command View EVAPerf The HP Command View EVAPerf tool can be used to gather and analyze statistics on storage system utilization. This section provides recommendations on using HP Command View EVAPerf to gather statistics to help identify periods of low storage system activity.
• -sz array limits data collection to the specified array(s). You must enter at least one array and can use either the storage system WWN or friendly name. • -fo filename directs output to a specified filename. Include the path information as necessary. NOTE: The HP Command View EVAPerf as command provides an alternative to the hps command for gathering and displaying IOPS and I/O throughput.
Storage system utilization data analysis When evaluating the performance of the EVA, it is important to distinguish between the two fundamental types of I/O patterns: small block reads/writes and large block reads/writes. If the workload is predominantly small block I/Os, it is important to examine I/Os per second (IOPS). If the workload is predominantly large block I/Os, it is important to evaluate the throughput in megabytes per second (MB/s).
Large block transfers Unlike small block transfers, when the workload on the storage system is dominated by large block I/Os the workload capacity of the storage system is best monitored in terms of the throughput as measured in megabytes per second or MB/s. Large block sequential workloads are typical of backup and restore operations.
3 Customer testimonials about online upgrades A number of EVA customers have successfully used the online firmware upgrade feature. The following are a few of these successes. Customer 1: A large telecommunications company Customer 1 is a very large EVA customer with hundreds of storage systems. They use Windows, HPUX and Solaris hosts. Applications include MicroSoft Exchange, MicroSoft Sequel Server, Oracle, and a number of custom applications.
Customer 3: A large computer chip manufacturer Customer 3 is another long time EVA customer with EVA storage systems being used at a large number of facilities. Their EVAs host data for a number of systems including Oracle 10g. Their experience with online upgrades is very different from the others described here. The division that was recently visited had been using online upgrades as their preferred method of controller upgrades for a number of releases.
A Effect of online firmware upgrade on application resiliency This appendix describes the impact of an online upgrade on various applications. Testing configuration The testing consisted of three phases; phase one was used as a baseline, phase two included light application load testing, and phase three included heavy application load testing. The first phase of testing validated the code upgrade procedure without any application load running.
Table 4 Application/Database layout Microsoft Exchange Server 2003 SP2 Number of users (light) 750 Number of users (heavy) 2000 Load Profile MMB3 Number of Storage Groups 2 Number of Databases per Storage Group 2 Mailbox Size 100MB Load Simulation Tool LoadSim 2003 Microsoft SQL Server 2005 Transaction Type TPC-C OLTP Read/Write Distribution 70/30 Database Size 120GB Client Connections (light) 50 Client Connections (heavy) 200 Load Simulation Tool Benchmark Factory Oracle 10g Trans
Light load testing Performed multiple online code loads under a light load. The objective was to observe application resiliency to the firmware upgrade. The expected outcome was that the firmware upgrade should complete without the applications failing or having excessive latency. The test steps included the following: 1. Start load simulators on all three applications. 2. Start capturing metrics with Windows Performance Monitor. 3. Allow I/O to run for at least one hour. 4. Perform firmware upgrade. 5.
Test results Light load testing Light load testing revealed that all three applications reported no errors of any kind during the firmware upgrade, while displaying moderate to significant latency for the duration of the firmware upgrade. The firmware upgrade lasted approximately six minutes, during which all three applications continued to respond and did not cause any errors.
Microsoft SQL 2005 performed well during the firmware event. See Figure 11. While the number of transactions per second did fall during the event, as soon as the firmware upgrade completed, the number of transactions returned to its normal level. The write to disk latency on the LUN holding the SQL 2005 database has a similar increase as the latency on Exchange, but it also recovered quickly to a steady state after the upgrade event.
Oracle 10g performed well during the firmware upgrade event. See Figure 12. While during the event, the number of transactions per second dropped off to a fraction of the steady state, the application did not return any errors and ramped back up to its pre-upgrade level.
Figure 13 shows the CPU utilization of the two EVA 4000 controllers during the firmware event. The CPU utilization prior to the firmware upgrade is very low (this configuration did not include any HP Business Copy EVA or HP Continuous Access EVA activity). Once the firmware upgrade is started, the CPU utilization spikes to near 100%. Again, once the firmware upgrade is complete, the CPU levels return to normal.
Figure 14 shows the number of I/Os per second on the EVA 4000, achieving a 60-40 split between reads and writes across a representative host port. During the firmware upgrade, I/Os dropped off significantly, but returned to their prior levels immediately after the upgrade event completed.
Heavy load testing Heavy load testing produced very similar results to the light load testing. The latencies and delays during the firmware upgrade event were significantly higher, however. Again, the applications reported no errors of any kind during the firmware upgrade. During the heavy load phase, two LoadSim client machines were necessary to drive the number of users. The latency score for both servers reached a maximum of ~5000ms, during the firmware upgrade. See Figure 15.
Microsoft SQL Server 2005 performed well under the heavy load and during the firmware upgrade. In Figure 16 and Figure 17, the transactions per second are compared against the average disk latency per read and write, respectively. While the disk latencies spiked during the firmware upgrade, the number of transactions per second did not drop to zero. While this does represent a significant performance impact, the application reported no errors of any kind during the firmware upgrade.
Figure 17 Heavy load on Microsoft SQL Server 2005, sheet 2 HP StorageWorks Enterprise Virtual Array online firmware upgrade best practices 37
Oracle 10g had a very similar performance impact from the firmware upgrade as Microsoft SQL 2005. Figure 18 shows the transactions per second dipping sharply during the firmware upgrade event, but then returning to a steady level once the upgrade is complete. While a reduction of transactions per second to nearly zero is not an ideal situation, the continuous running of the application server is of far more value.
The performance of the EVA 4000 is displayed in the following figures, showing the behavior of the array during the upgrade event. First, the CPU utilization spikes for both controllers during the upgrade. See Figure 19. The relatively quick return to normal operating levels means that the applications running on the connected servers are able to weather the event without failure.
Figure 20 shows I/Os per second for both reads and writes. During normal operation, there is roughly a 60-40 split between reads and writes on the port in question. During the firmware upgrade, the number of I/Os per second drops to nearly zero as the EVA 4000 handles the firmware upgrade. They do, however, return to a steady state once the upgrade is completed.
Figure 21 Heavy load on EVA, total I/Os per second HP StorageWorks Enterprise Virtual Array online firmware upgrade best practices 41
Figure 22 shows the total Kilobytes per second being handled by the array. During the firmware upgrade, the amount of data being handled by the array drops to a mere fraction of what it is capable of during the normal course of operation. When the firmware upgrade is complete, however, the level of data being processed returns to normal. Note that the upgrade event lasts approximately 6 minutes.
Performance metrics Exchange 2003 • Loadsim 95th Percentile Latency (total) Score • Read and Write latency (log and database) Microsoft SQL Server 2005 • Transactions per second • Seconds per write (log and database) Oracle 10g • Transactions per second • Seconds per write (log and database) Application Servers No performance metrics measured.
Effect of online firmware upgrade on application resiliency
Index A all disk statistics, capturing, 22 analyzing storage system utilization, 21 application I/O timeouts, 16 application resiliency testing, 27 conclusions, 42 configuration, 27 during heavy load, 35 during light load, 30 performance metrics, 43 audience, 7 C controller resync, defined, 11 conventions document, 8 customer testimonials, 25 D data analysis large block transfers, 24 small block transfers, 23 storage system utilization, 23 definition of terms, 11 disk drive firmware online upgrade, 12
Selecting an appropriate time for the online upgrade, 17 small block transfers, 23 Solaris host I/O timeouts, 15 storage system utilization, 23 U understanding online upgrades, 11 V virtual disk statistics, capturing, 22 T technical support HP, 9 terms, defined, 11 testimonials, 25 Tru64 host I/O timeouts, 15 46 W Windows host I/O timeouts, 13