Power Systems Operating Systems

ManualsBrandsIntel ManualsPower Supply7xx Servers

IBM Power Systems

Performance Capabilities Reference

IBM i operating system Version 6.1

January/April/October 2008

This

document is intended for use by qualified performance related programmers or analysts from

IBM, IBM Business Partners and IBM customers using the IBM Power

Systems platform

running IBM i operating system. Information in this document may be readily shared with

IBM i customers to understand the performance and tuning factors in IBM i operating system

6.1 and earlier where applicable. For the latest updates and for the latest on IBM i

performance information, please refer to the Performance Management Website:

http://www.ibm.com/systems/i/advantages/perfmgmt/index.html

Requests for use of performance information by the technical trade press or consultants should

be directed to Systems Performance Department V3T, IBM Rochester Lab, in Rochester, MN.

55901 USA.

IBM i 6.1 Performance Capabilities Reference - January/April/October 2008

Summary of content (368 pages)

PAGE 1
IBM Power Systems Performance Capabilities Reference IBM i operating system Version 6.1 January/April/October 2008 This document is intended for use by qualified performance related programmers or analysts from IBM, IBM Business Partners and IBM customers using the IBM PowerTM Systems platform running IBM i operating system. Information in this document may be readily shared with IBM i customers to understand the performance and tuning factors in IBM i operating system 6.1 and earlier where applicable.
PAGE 2
Note! Before using this information, be sure to read the general information under “Special Notices.” Twenty Fifth Edition (January/April/October 2008) SC41-0607-13 This edition applies to IBM i operating System V6.1 running on IBM Power Systems. You can request a copy of this document by download from IBM i Center via the System i Internet site at: http://www.ibm.com/systems/i/ .
PAGE 3
Table of Contents Special Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Purpose of this Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 2.
PAGE 4
Chapter 5. Communications Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.2 Communication Performance Test Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5.5 TCP/IP Secure Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.6 Performance Observations and Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 5
10.2 DB2 for i5/OS access with ODBC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References for ODBC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 11. Domino on i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Domino Workload Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 6
14.1.3.1 571B RAID5 vs RAID6 - 10 15K 35GB DASD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1.3.2 571B IOP vs IOPLESS - 10 15K 35GB DASD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1.4 571B, 5709, 573D, 5703, 2780 IOA Comparison Chart . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1.
PAGE 7
15.3 Workloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4 Comparing Performance Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.5 Lower Performing Backup Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.6 Medium & High Performing Backup Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 8
17.2.4 Virtual Ethernet Connections: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.2.5 IXS/IXA IOP Resource: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.3 System i memory rules of thumb for IXS/IXA and iSCSI attached servers. . . . . . . . . . . . . . . . . 17.3.1 IXS and IXA attached servers: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.3.
PAGE 9
21.1 Switchable IASP’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Geographic Mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 22. IBM Systems Workload Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 10
Special Notices DISCLAIMER NOTICE Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. This information is presented along with general recommendations to assist the reader to have a better understanding of IBM(*) products.
PAGE 11
The following terms, which may or may not be denoted by an asterisk (*) in this publication, are trademarks of the IBM Corporation.
PAGE 12
Purpose of this Document The intent of this document is to help provide guidance in terms of IBM i operating system performance, capacity planning information, and tips to obtain optimal performance on IBM i operating system. This document is typically updated with each new release or more often if needed. This October 2008 edition of the IBM i V6.1 Performance Capabilities Reference Guide is an update to the April 2008 edition to reflect new product functions announced on October 7, 2008.
PAGE 13
Chapter 1. Introduction IBM System i and IBM System p platforms unified the value of their servers into a single, powerful lineup of servers based on industry leading POWER6 processor technology with support for IBM i operating system (formerly known as i5/OS), IBM AIX and Linux for Power. Following along with this exciting unification are a number of naming changes to the formerly named i5/OS, now officially called IBM i operating system.
PAGE 14
versions. The primary public performance information web site is found at: http://www.ibm.com/systems/i/advantages/perfmgmt/index.html IBM i 6.1 Performance Capabilities Reference - January/April/October 2008 © Copyright IBM Corp.
PAGE 15
Chapter 2. iSeries and AS/400 RISC Server Model Performance Behavior 2.1 Overview iSeries and AS/400 servers are intended for use primarily in client/server or other non-interactive work environments such as batch, business intelligence, network computing etc. 5250-based interactive work can be run on these servers, but with limitations. With iSeries and AS/400 servers, interactive capacity can be increased with the purchase of additional interactive features.
PAGE 16
interactive utilization - an average for the interval. Since average utilization does not indicate potential problems associated with peak activity, a second metric (SCIFTE) reports the amount of interactive utilization that occurred above threshold. Also, interactive feature utilization was reported when printing a System Report generated from Collection Services data. In addition, Management Central now monitors interactive CPU relative to the system/partition capacity.
PAGE 17
2.1.4 V5R2 and V5R1 There were several new iSeries 8xx and 270 server model additions in V5R1 and the i890 in V5R2. However, with the exception of the DSD models, the underlying server behavior did not change from V4R5. All 27x and 8xx models, including the new i890 utilize the same server behavior algorithm that was announced with the first 8xx models supported by V4R5. For more details on these new models, please refer to Appendix C, “CPW, CIW and MCU Values for iSeries”.
PAGE 18
y The new server algorithm only applies to the new hardware available in V4R5 (2xx, 8xx and SBx models). The behavior of all other hardware, such as the 7xx models is unchanged (see section 2.2.3 Existing Model section for 7xx algorithm). 2.2.2 Choosing Between Similarly Rated Systems Sometimes it is necessary to choose between two systems that have similar CPW values but different processor megahertz (MHz) values or L2 cache sizes.
PAGE 19
grows at a rate which can eventually eliminate server/batch capacity and limit additional interactive growth. It is best for interactive workloads to execute below (less than) the knee of the curve. (However, for those models having the knee at 1/3 of the total interactive capacity, satisfactory performance can be achieved.) The following graph illustrates these points. Model 7xx and 9/98 Model 170 CPU CPU Distribution vs.
PAGE 20
2.3 Server Model Differences Server models were designed for a client/server workload and to accommodate an interactive workload. When the interactive workload exceeds an interactive CPW threshold (the “knee of the curve”) the client/server processing performance of the system becomes increasingly impacted at an accelerating rate beyond the knee as interactive workload continues to build.
PAGE 21
Custom Server Model CPU Distribution vs. Interactive Utilization Available CPU 100 80 60 Available for Client/Server 40 available CFINT interactive Knee 20 0 0 6/7 Full Fraction of Interactive CPW Applies to: AS/400e Custom Servers, AS/400e Mixed Mode Servers Figure 2.2. Custom Server Model behavior Server Model CPU Distribution vs.
PAGE 22
2.4 Performance Highlights of Model 7xx Servers 7xx models were designed to accommodate a mixture of traditional “green screen” applications and more intensive “server” environments. Interactive features may be upgraded if additional interactive capacity is required. This is similar to disk, memory, or other features.
PAGE 23
2.5 Performance Highlights of Model 170 Servers iSeries Dedicated Server for Domino models will be generally available on September 24, 1999. Please refer to Section 2.13, iSeries Dedicated Server for Domino Performance Behavior, for additional information. Model 170 servers (features 2289, 2290, 2291, 2292, 2385, 2386 and 2388) are significantly more powerful than the previous Model 170s announced in Feb. '98. They have a faster processor (262MHz vs. 125MHz) and more main memory (up to 3.5GB vs. 1.0GB).
PAGE 24
The next chart shows the performance capacity of the current and previous Model 170 servers. Previous vs. Current AS/400e server 170 Performance 1200 1090 CPW Values 1000 800 Current Previous * 600 460 400 319 319 220 210 200 73 460 114 16 23 29 40 2159 2160 2164 2176 67 15 0 2183 50 2289 115 73 25 30 50 70 70 20 2290 2291 2292 2385 2386 2388 Interactive Processor * Unconstrained V4R2 rates Figure 2.5. Previous vs. Current Server 170 Performance 2.
PAGE 25
and higher than normal CFINT values. The goal is to avoid exceeding the threshold (knee of the curve) value of interactive capacity. 2.8 Interactive Utilization When the interactive CPW utilization is beyond the knee of the curve, the following formulas can be used to determine the effective interactive utilization or the available/remaining client/server CPW. These equations apply to all server models.
PAGE 26
Now if the interactive CPU is held to less than 4% CPU (the knee), then the CPU available for the System, Batch, and Client/Server work is 100% - the Interactive CPU used. If the interactive CPU is allowed to grow above the knee, say for example 9% (or 41 CPW), then the CPU percent remaining for the Batch and System is calculated using the formulas above: X = (9 - 4) / (11 - 4) = .71 (percent into the overhead area) EIU = 4 + (.
PAGE 27
If customers modify an IBM-supplied class description, they are responsible for ensuring the priority value is 35 or less after each new release or cumulative PTF package has been installed. One way to do this is to include the Change Class (CHGCLS) command in the system Start Up program. NOTE: Several IBM-supplied class descriptions already have RUNPTY values of 35 or less. In these cases no user action is required. One example of this is class description QPWFSERVER with RUNPTY(20).
PAGE 28
Server Dynamic Tuning Recommendations On the new systems and mixed mode servers have the QDYNPTYSCD and QDYNPTYADJ system value set on. This preserves non-interactive capacities and the interactive response times will be dynamic beyond the knee regardless of the setting. Also set non-interactive class run priorities to less than 35. On earlier servers and 2/98 model 170 systems use your interactive requirements to determine the settings.
PAGE 29
2.10 Managing Interactive Capacity Interactive/Server characteristics in the real world. Graphs and formulas listed thus far work perfectly, provided the workload on the system is highly regu lar and steady in nature. Of course, very few systems have workloads like that. The more typical case is a dynamic combination of transaction types, user activity, and batch activity.
PAGE 30
There are other means for determining interactive utilization. The easiest of these is the performance monitoring function of Management Central, which became available with V4R3.
PAGE 31
2. A similar effect can be found with index builds. If parallelism is enabled, index creation (CRTLF, Create Index, Open a file with MAINT(*REBUILD), or running a query that requires an index to be build) will be sent to service jobs that operate in non-interactive mode, but charge their work back to the job that requested the service. Again, the work does not count as “interactive”, but the performance data will show the resource consumption as if they were.
PAGE 32
2.11 Migration from Traditional Models This section describes a suggested methodology to determine which server model is appropriate to contain the interactive workload of a traditional model when a migration of a workload is occurring. It is assumed that the server model will have both interactive and client/server workloads. To get the same performance and response time, from a CPU perspective, the interactive CPU utilization of the current traditional model must be known.
PAGE 33
*********************************************************************************** Component Report Component Interval Activity Data collected 190396 at 1030 Member . . . : Q960791030 Model/Serial . : 310-2043/10-0751D Main St... Library. . : PFR System name. . : TEST01 Version/Re.. ITV End Tns/hr Rsp/Tns CPU % Total CPU% Inter CPU % Batch 10:36 10:41 10:46 10:51 10:56 : 11:51 11:56 6,164 7,404 5,466 5,622 4,527 0.8 0.9 0.7 1.2 0.8 85.2 91.3 97.6 97.9 97.9 32.2 45.2 38.8 35.6 16.
PAGE 34
one third of the total possible interactive workload, for non-custom models. The equation shown in this section will migrate a traditional system to a server system and keep the interactive workload at or below the knee of the curve, that is, using less than two thirds of the total possible interactive workload. In some environments these equations will be too conservative. A value of 1.2, rather than 1.5 would be less conservative.
PAGE 35
2.13 iSeries for Domino and Dedicated Server for Domino Performance Behavior In preparation for future Domino releases which will provides support for DB2 files, the previous processing limitations associated with DSD models have been removed in i5/OS V5R3. In addition, a PTF is available for V5R2 which also removes the processing limitations for DSD models and allows full use of DB2. Please refer to PTF MF32968 and its prerequisite requirements.
PAGE 36
Domino-Complementary Processing Prior to V5R1, processing that did not spend the majority of its time in Domino code was considered non-Domino processing and was limited to approximately 10-15% of the system capacity. With V5R1, many applications that would previously have been treated as non-Domino may now be considered as Domino-complementary when they are used in conjunction with Domino.
PAGE 37
Similar to previous DSD performance behavior for interactive processing, the Interactive CPW rating of 0 allows for system administrative functions to be performed by a single interactive user. In practice, a single interactive user will be able to perform necessary administrative functions without constraint. If multiple interactive users are simultaneously active on the DSD, the Interactive CPW capacity will likely be exceeded and the response times of those users may significantly lengthen.
PAGE 38
processing present in the Linux logical partition, and all resources allocated to the Linux logical partition can essentially be used as though it were complementary processing. It is not necessary to proportionally increase the amount of Domino processing in the OS/400 logical partition to account for the fact that Domino processing is not present in the Linux logical partition .
PAGE 39
Chapter 3. Batch Performance In a commercial environment, batch workloads tend to be I/O intensive rather than CPU intensive. The factors that affect batch throughput for a given batch application include the following: y Memory (Pool size) y CPU (processor speed) y DASD (number and type) y System tuning parameters Batch Workload Description The Batch Commercial Mix is a synthetic batch workload designed to represent multiple types of batch processing often associated with commercial data processing.
PAGE 40
3.3 Tuning Parameters for Batch There are several system parameters that affect batch performance. The magnitude of the effect for each of them depends on the specific application and overall system characteristics. Some general information is provided here. y Expert Cache Expert Cache did not have a significant effect on the Commercial Mix batch workload. Expert Cache does not start to provide improvement unless the following are true for a given workload.
PAGE 41
improve performance by eliminating disk I/O operations. y If communications lines are involved in the batch application, try to limit the number of communications I/Os by doing fewer (and perhaps larger) larger application sends and receives. Consider blocking data in the application. Try to place the application on the same system as the frequently accessed data.
PAGE 42
Chapter 4. DB2 for i5/OS Performance This chapter provides a summary of the new performance features of DB2 for i5/OS on V6R1, V5R4 and V5R3, along with V5R2 highlights. Summaries of selected key topics on the performance of DB2 for i5/OS are provided. General information and some recommendations for improving performance are included along with links to the latest information on these topics. Also included is a section of performance references for DB2 for i5/OS. 4.
PAGE 43
y DB2 Multisystem tables New function available in V6R1 whose use may affect SQL performance are derived key indexes, decimal floating point data type support, and the select from insert statement. A derived key index can have an expression in place of a column name that can use built-in functions, user defined functions, or some other valid expression. Additionally, you can use the SQL CREATE INDEX statement to create a sparse index using a WHERE condition.
PAGE 44
the statement is complete. The implementation to invoke the locking causes a physical DASD write to the journal for each record, which causes journal waits. Journal caching on allows the journal writes to accumulate in memory and have one DASD write per multiple journal entries, greatly reducing the journal wait time. So select from insert statements with FINAL TABLE run much faster with journal caching on. Figure 4.
PAGE 45
Table Expressions (RCTE) which allow for more elegant and better performing implementations of recursive processing. In addition, enhancements have been made in i5/OS V5R4 to the support for materialize query tables (MQTs) and partitioned table processing, which were both new in i5/OS V5R3. i5/OS V5R4 SQE Query Coverage The query dispatcher controls whether an SQL query will be routed to SQE or to CQE.
PAGE 46
Enhancements to extend the use of materialized query tables (MQTs) were added in i5/OS V5R4. New supported function in MQT queries by the MQT matching algorithm are unions and partitioned tables, along with limited support for scalar subselects, UDFs and user defined table functions, RCTE, and some scalar functions. Also new to i5/OS V5R4, the MQT matching algorithm now tries to match constants in the MQT with parameter markers or host variable values in the query.
PAGE 47
SQL queries which continue to be routed to CQE in i5/OS V5R3 have the following attributes: y y y y y y y y Sensitive cursor Like/Substring predicates LOB columns References to DDS logical files NLSS/CCSID translation between columns DB2 Multisystem ALWCPYDTA(*NO) Tables with select/omit logicals over them i5/OS V5R3 SQE Performance Enhancements Many enhancements were made in i5/OS V5R3 to enable faster query runtime and use less system resource.
PAGE 48
Partitioned Table Support Table partitioning is a new feature introduced in i5/OS V5R3. The design is localized on an individual table basis rather than an entire library. The user specifies one or more fields which collectively act as a partitioning key. Next the records in the table are distributed into multiple disjoint sets based on the partitioning scheme used: either a system-supplied hashing function or a set of value ranges (such as dates by month or year) supplied by the user.
PAGE 49
y Statistical Strategies y SMP Considerations y Administration Examples (Adding a Partition, Dropping a Partition, etc.) Materialized Query Table Support The initial release of i5/OS V5R3 includes the Materialized Query Table (MQT) (also referred to as automatic summary tables or materialized views) support in UDB DB2 for i5/OS as essentially a technology preview. Pre-April 2005 i5/OS V5R3 provides the capability of creating materialized query tables, but no optimizer awareness of these MQTs.
PAGE 50
more information may be used in the query plan costing phase than was available to the optimizer previously. The optimizer may now use newly implemented database statistics to make more accurate decisions when choosing the query access plan. Also, the enhanced optimizer may more often select plans using hash tables and sorted partial result lists to hold partial query results during query processing, rather than selecting access plans which build temporary indexes.
PAGE 51
should be made to determine if the needed statistics are available. Also in environments where long running queries are run only one time, it may be beneficial to ensure that statistics are available prior to running the queries. Some properties of database column statistics are as follows: y Column statistics occupy little storage, on average 8-12k per column. y Column Statistics are gathered through one full scan of the database file for any given number of columns in the database file.
PAGE 52
SQE for V5R2 Summary Enhancements to DB2 for i5/OS, called SQE, were made in V5R2. The SQE enhancements are object oriented implementations of the SQE optimizer, the SQE query engine and the SQE database statistics. In V5R2 a subset of the read-only SQL queries will be optimized and run with the SQE enhancements. The effect of SQE on performance will vary by workload and configuration. For the most recent information on SQE please see the SQE web page on the DB2 for i5/OS web site located at www.iseries.
PAGE 53
4.6 DB2 Symmetric Multiprocessing feature Introduction The DB2 SMP feature provides application transparent support for parallel query operations on a single tightly-coupled multiprocessor System i (shared memory and disk). In addition, the symmetric multiprocessing (SMP) feature provides additional query optimization algorithms for retrieving data.
PAGE 54
limit the amount of data it brings into and keeps in memory to a job’s share of memory. The amount of memory available to each job is inversely proportional to the number of active jobs in a memory pool. The memory-sharing algorithms discussed above provide balanced performance for all the jobs running in a memory pool. Running short transactional queries in the same memory pool as long running, data intensive queries is acceptable.
PAGE 55
y Allows customers to replace current programming methods of capturing and transmitting journal entries between systems with more efficient system programming methods. This can result in lower CPU consumption and increased throughput on the source system. y Can significantly reduce the amount of time and effort required by customers to reconcile their source and target databases after a system failure.
PAGE 56
There are 3 sets of tasks which do the SMAPP work. These tasks work in the background at low priority to minimize the impact of SMAPP on system performance. The tasks are as follows: y JO_EVALUATE-TASK - Evaluates indexes, estimates rebuild time for an index, and may start or stop implicit journaling of an index. y JO-TUNING-TASK - Periodically wakes up to consider where the user recovery threshold is set and manages which indexes should be implicitly journaled.
PAGE 57
multiple nodes in the cluster, access to the database files is seamless and transparent to the applications and users that reference the database. To the users, the partitioned files still behave as though they were local to their system. The most important aspect of obtaining optimal performance with DB2 Multisystem is to plan ahead for what data should be partitioned and how it should be partitioned.
PAGE 58
4.10 Referential Integrity In a database user environment, there are frequent cases where the data in one file is dependent upon the data in another file. Without support from the database management system, each application program that updates, deletes or adds new records to the files must contain code that enforces the data dependency rules between the files.
PAGE 59
The following are performance tips to consider when using triggers support: y Triggers are activated by an external call. The user needs to weigh the benefit of the trigger against the cost of the external call. y If a trigger is going to be used, leave as much validation to the trigger program as possible. y Avoid opening files in a trigger program under commitment control if the trigger program does not cause changes to commitable resources.
PAGE 60
To create the variable length field just described, use the following DB2 statement: CREATE TABLE library/table-name (field VARCHAR(50) ALLOCATE(20) NOT NULL) In this particular example the field was created with the NOT NULL option. The other two options are NULL and NOT NULL WITH DEFAULT. Refer to the NULLS section in the SQL Reference to determine which NULLS option would be best for your use.
PAGE 61
01 DESCR. 49 DESCR-LEN 49 DESCRIPTION PIC S9(4) COMP-4. PIC X(40). EXEC SQL FETCH C1 INTO DESCR END-EXEC. For more detail about the vary-length character string, refer to the SQL Programmer's Guide. The above point is also true when using a high-level language to insert values into a variable length field. The variable that contains the value to be inserted must be declared as variable or varying.
PAGE 62
In contrast, when reuse is active, the database support will process the added record more like an update operation than an add operation. The database support will maintain a bit map to keep track of deleted records and to provide fast access to them. Before a record can be added, the database support must use the bit-map to find the next available deleted record space, read the page containing the deleted record entry into storage, and seize the deleted record to allow replacement with the added record.
PAGE 63
2. The System i information center section on DB2 for i5/OS under Database and file systems has information on all aspects of DB2 for i5/OS including the section Monitor and Tune database under Administrative topics. This can be found at url: http://www.ibm.com/eserver/iseries/infocenter 3. Information on creating efficient running queries and query performance monitoring and tuning is found in the DB2 for i5/OS Database Performance and Query Optimization manual.
PAGE 64
Chapter 5. Communications Performance There are many factors that affect System i performance in a communications environment. This chapter discusses some of the common factors and offers guidance on how to help achieve the best possible performance. Much of the information in this chapter was obtained as a result of analysis experience within the Rochester development laboratory.
PAGE 65
y IBM’s Host Ethernet Adapter (HEA) integrated 2-Port 10/100/1000 Based-TX PCI-E IOA supports checksum offloading, 9000-byte jumbo frames (1 Gigabit only) and LSO - Large Send Offload (IPv4 only). These adapters do not require an IOP to be installed in conjunction with the IOA.
PAGE 66
181A1 IBM 2-Port 10/100/1000 Base-TX PCI-e7 10 / 100 / 1000 Yes Yes Yes Yes 181B2 IBM 2-Port Gigabit Base-SX PCI-e 10000 Yes Yes Yes Yes 181C1 IBM 4-Port 10/100/1000 Base-TX PCI-e7 10 / 100 / 1000 Yes Yes Yes Yes Yes Yes Yes Yes 18191 IBM 4-Port 10/100/1000 Base-TX PCI-e7,9 10 / 100 / 1000 n/a5 Yes N/A Yes No N/A Virtual Ethernet4 n/a5 Yes N/A Yes Yes N/A Blade8 Notes: 1. Unshielded Twisted Pair (UTP) card; uses copper wire cabling 2. Uses fiber optics 3.
PAGE 67
To demonstrate communications performance in various ways, several workload scenarios are analyzed. Each of these scenarios may be executed with regular nonsecure sockets or with secure SSL using the GSK API: 1. Request/Response (RR): The client and server send a specified amount of data back and forth over a connection that remains active. 2.
PAGE 68
Virtual Ethernet FTP 1 Session 2 Sessions 3 Sessions Performance in MB per second 1 Disk Unit ASP on 2757 IOA 15 Disk Units ASP on 2757 IOA 10.8 42.0 10.5 70.0 10.4 75.0 5.4 TCP/IP non-secure performance In table 5.4 you will find the payload information for the different Ethernet types. The most important factor with streaming is to determine how much data can be transferred. The results are listed in bits and bytes per second.
PAGE 69
RR & ACRR Performance (Transactions per second per server CPU) Transaction Type Request/Response (RR) 128 Bytes Asym. Connect/Request/Response (ACRR) 8K Bytes Threads 1 Gigabit Virtual 1 26 1 26 991.32 1330.45 261.51 279.64 873.62 912.34 218.82 221.21 Notes: y Capacity metrics are provided for nonsecure transactions y The table data reflects System i as a server (not a client) y The data reflects Sockets and TCP/IP y This is only a rough indicator for capacity planning.
PAGE 70
Table 5.6 SSL Performance (transactions per second per server CPU) Nonsecure RC4 / RC4 / AES128 / TCP/IP MD5 SHA-1 SHA-1 Transaction Type: Request/Response 1167 565.4 530.0 479.6 (RR) 128 Byte Asym. Connect/Request/Response 249.7 53.4 48.0 31.3 (ACRR) 8K Bytes Large Transfer 478.4 55.7 53.3 36.
PAGE 71
Table 5.7 SSL Relative Performance (scaled to Nonsecure baseline) Nonsecure RC4 / RC4 / AES128 / TCP/IP MD5 SHA-1 SHA-1 Transaction Type: Request/Response 1.0 x 2.1 2.2 2.4 (RR) 128 Byte Asym. Connect/Request/Response 1.0 y 4.7 5.2 8.0 (ACRR) 8K Bytes Large Transfer 1.0 z 8.6 9.0 13.
PAGE 72
VPN Relative Performance (scaled to Nonsecure baseline) Transaction Type: Nonsecure TCP/IP AH with MD5 ESP with RC4 / MD5 ESP with AES128 / SHA-1 ESP with TDES / SHA-1 Request/Response 1.0 x 2.7 3.6 3.8 7.9 (RR) 128 Byte Asym. Connect/Request/Response 1.0 y 5.0 6.6 7.6 27.5 (ACRR) 8K Bytes Large Transfer 1.0 z 10.9 15.4 18.7 88.
PAGE 73
y For additional information regarding your Host Ethernet Adapter please see your specification manual and the Performance Management page for future white papers regarding iSeries and HEA. y 1 Gigabit Jumbo frame Ethernet enables 12% greater throughput compared to normal frame 1 Gigabit Ethernet. This may vary significantly based on your system, network and workload attributes.
PAGE 74
only a few seconds may perform best. Setting this value too low may result in extra error handling impacting system capacity. y No single station can or is expected to use the full bandwidth of the LAN media. It offers up to the media's rated speed of aggregate capacity for the attached stations to share. The disk access time is usually the limiting resource.
PAGE 75
• • • • • • • • • • • • there is network congestion or overruns to certain target system adapters, then increasing the value from the default=*NONE to 2 or something larger may improve performance. MAXLENRU for APPC on the mode description (MODD): If a value of *CALC is selected for the maximum SNA request/response unit (RU) the system will select an efficient size that is compatible with the frame size (on the LIND) that you choose. The newer LAN IOPs support IOP assist.
PAGE 76
• FTS is a less efficient way to transfer data. However, it offers built in data compression for line speeds less than a given threshold. In some configurations, it will compress data when using LAN; this significantly slows down LAN transfers. 5.8 HPR and Enterprise extender considerations Enterprise Extender is a protocol that allows the transmission of APPC data over IP only infrastructure. In System i support for Enterprise Extender is added in 5.4.
PAGE 77
5.9 Additional Information Extensive information can be found at the System i Information Center web site at: http://www.ibm.com/eserver/iseries/infocenter . y For network information select “Networking”: y See “TCP/IP setup” d “Internet Protocol version 6” for IPv6 information y See “Network communications” d “Ethernet” for Ethernet information. y For application development select “Programming”: y See “Communications” d “Socket Programming” for the Sockets Programming guide.
PAGE 78
Chapter 6. Web Server and WebSphere Performance This section discusses System i performance information in Web serving and WebSphere environments. Specific products that are discussed include: HTTP Server (powered by Apache) (in section 6.1), PHP Zend Core for i (6.2), WebSphere Application Server and WebSphere Application Server - Express (6.3), Web Facing (6.4), Host Access Transformation Services (6.5), System Application Server Instance (6.6), WebSphere Portal Server (6.7), WebSphere Commerce (6.
PAGE 79
Information source and disclaimer: The information in the sections that follow is based on performance measurements and analysis done in the internal IBM performance lab. The raw data is not provided here, but the highlights, general conclusions, and recommendations are included. Results listed here do not represent any particular customer environment. Actual performance may vary significantly from what is provided here. Note that these workloads are measured in best-case environments (e.g.
PAGE 80
y CGI: HTTP invokes a CGI program which builds a simple HTML page and serves it via the HTTP server. This CGI program can run in either a new or a named activation group. The CGI programs were compiled using a "named" activation group unless specified otherwise. Web Server Capacity Planning: Please use the IBM Systems Workload Estimator to do capacity planning for Web environments using the following workloads: Web Serving, WebSphere, WebFacing, WebSphere Portal Server, WebSphere Commerce.
PAGE 81
Table 6.1 i5/OS V5R4 Web Serving Relative Capacity - Static Page Relative Capacity Metrics Transaction Type: Static Page - IFS Static Page - Local Cache Static Page - FRCA Non-secure 2.016 3.538 34.730 Secure 1.481 2.
PAGE 82
Table 6.2 i5/OS V5R4 Web Serving Relative Capacity - CGI Relative Capacity Metrics Transaction Type: CGI - New Activation CGI - Named Activation Non-secure 0.092 0.475 Secure 0.090 0.
PAGE 83
Table 6.3 i5/OS V5R4 Web Serving Relative Capacity for Static (varied sizes) Relative Capacity Metrics Transaction Type: 1K Bytes KeepAlive Static Page - IFS Static Page - Local Cache Static Page - FRCA Off 1.558 2.407 11.564 `10K Bytes On 2.016 3.538 34.730 Off 1.347 2.095 7.691 On 1.793 3.044 13.539 100K Bytes Off 0.830 0.958 1.873 On 1.068 1.243 2.622 Notes/Disclaimers: y These results are relative to each other and do not scale with other environments.
PAGE 84
a. V5R4 provides similar Web server performance compared with V5R3 for most transactions (with similar hardware). In V5R4 there are opportunities to exploit improved CGI performance. More information can be found in the FAQ section of the HTTP server website http://www.ibm.com/servers/eserver/iseries/software/http/services/faq.html under “How can I improve the performance of my CGI program?” b. V5R3 provided similar Web server performance compared with V5R2 for most transactions (with similar hardware). c.
PAGE 85
variable overhead of encryption/decryption, which is proportional to the number of bytes in the transaction. Note the capacity factors in the tables above comparing non-secure and secure serving. From Table 6.1, note that simple transactions (e.g., static page serving), the impact of secure serving is around 20%. For complex transactions (e.g., CGI, servlets), the overhead is more watered down. This relationship assumes that KeepAlive is used, and therefore the overhead of key processing can be minimized.
PAGE 86
11. HTTP and TCP/IP Configuration Tips: Information to assist with the configuration for TCP/IP and HTTP can be viewed at http://publib.boulder.ibm.com/infocenter/iseries/v5r4/index.jsp and http://www.ibm.com/servers/eserver/iseries/software/http/ a. The number of HTTP server threads: The reason for having multiple server threads is that when one server is waiting for a disk or communications I/O to complete, a different server job can process another user's request.
PAGE 87
13. File System Considerations: Web serving performance varies significantly based on which file system is used. Each file system has different overheads and performance characteristics. Note that serving from the ROOT or QOPENSYS directories provide the best system capacity. If Web page development is done from another directory, consider copying the data to a higher-performing file system for production use.
PAGE 88
6.2 PHP - Zend Core for i This section discusses the different performance aspects of running PHP transaction based applications using Zend Core for i, including DB access considerations, utilization of RPG program call, and the benefits of using Zend Platform. Zend Core for i Zend Core for i delivers a rapid development and production PHP foundation for applications using PHP running on i with IBM DB2 for i or MySQL databases.
PAGE 89
y y y y y Throughput - Orders Per Minute (OPM). Each order actually consists of 10 web requests to complete the order. Order response time (RT) in milliseconds Total CPU - Total system processor utilization CPU Zend/AP - CPU for the Zend Core / Apache component. CPU DB - CPU for the DB component Database Access The following four methods were used to access the backend database for the DVD Store application. In the first three cases, SQL requests were issued directly from the PHP pages.
PAGE 90
Conclusions: 1. The performance of each DB connection interface provides exceptional response time at very high throughput. Each order processed consisted of ten web requests. As a result, the capacity ranges from about 650 transactions per second up to about 870 transactions per second. Using Zend Platform will provide even higher performance (refer to the section on Zend Platform). 2.
PAGE 91
Conclusions: 1. As stated earlier, persistent connections can dramatically improve overall performance. When using persistent connections for all transactions, the DB CPU utilization is significantly less than when using non-persistent connections. 2. For any transactions that run with autocommit turned on, use persistent connections. If the transaction requires that autocommit be turned off, use of non-persistent connections may be sufficient for pages that don’t have heavy usage.
PAGE 92
OS / DB Zend Version Connect OPM RT (ms) Total CPU CPU - Zend/AP CPU - DB i 6.1 / DB2 V2.5.2 V2.5.2/Platform db2_pconnect db2_pconnect 5041 176 98 62 31 6795 129 95 44 46 i 6.1/MySQL 5.0 V2.5.2 V2.5.2/Platform mysqli mysqli 3974 224 98 49 47 4610 191 96 31 62 Conclusions: 1. In both cases above, the overall system capacity improved significantly when using Zend Platform, by about 15-35% for this workload.
PAGE 93
6.3 WebSphere Application Server This section discusses System i performance information for the WebSphere Application Server, including WebSphere Application Server V6.1, WebSphere Application Server V6.0, WebSphere Application Server V5.0 and V5.1, and WebSphere Application Server Express V5.1. Historically, both WebSphere and i5/OS Java performance improve with each version.
PAGE 94
because the improvements largely resulted from significant reductions in pathlength and CPU, environments that are constrained by other resources such as IO or memory may not show the same level of improvements seen here. Tuning changes in V6R1 As indicated above, most improvements will require no changes to an application.
PAGE 95
For WebSphere 5.1 and earlier refer to the Performance Considerations guide at: www.ibm.com/servers/eserver/iseries/software/websphere/wsappserver/product/PerformanceConsideratio ns.html For WebSphere 5.1, 6.0 and 6.1 please refer to the following page and follow the appropriate link: www.ibm.com/software/webservers/appserv/was/library/ Although some capacity planning information is included in these documents, please use the IBM Systems Workload Estimator as the primary tool to size WebSphere environments.
PAGE 96
Trade 6 Benchmark (IBM Trade Performance Benchmark Sample for WebSphere Application Server) Description: Trade 6 is the fourth generation of the WebSphere end-to-end benchmark and performance sample application. The Trade benchmark is designed and developed to cover the significantly expanding programming model and performance technologies associated with WebSphere Application Server.
PAGE 97
The Trade 6 application allows a user, typically using a Web browser, to perform the following actions: y Register to create a user profile, user ID/password and initial account balance y Login to validate an already registered user y Browse current stock price for a ticker symbol y Purchase shares y Sell shares from holdings y Browse portfolio y Logout to terminate the users active interval Each action is comprised of many primitive operations running within the context of a single HTTP request/response.
PAGE 98
WebSphere Application Server V6.1 Historically, new releases of WebSphere Application Server have offered improved performance and functionality over prior releases of WebSphere. WebSphere Application Server V6.1 is no exception. Furthermore, the availability of WebSphere Application Server V6.1 offers an entirely new opportunity for WebSphere customers. Applications running on V6.
PAGE 99
Trade3 Measurement Results: Tra d e o n S y s te m i - H is to ric a l V ie w C a p a c ity Tra d e 3 /6 o n m o d e l 8 2 5 2 W a y L P A R 550 500 Transactions/Second 450 400 350 300 250 200 150 100 50 0 Trade3-EJB V5R2 WAS 5.0 V5R3 WAS 5.0 V5R3 WAS 5.1 V5R3 WAS 6.0 (Trade6) Trade3-JDBC V5R4 WAS 6.0 (Trade6) V5R4 WAS 6.1 Classic (Trade 6.1) V5R4 WAS 6.1 IBM Tech For Java (Trade 6.1) Figure 6.
PAGE 100
Trade Scalability Results: Trade on System i Scaling of Hardware and Software Power 6 Power 5 Trade 3 4000 2000 1200 3500 1000 1500 2500 2000 1500 800 Transactions/Second Transactions/Second Transactions/Second 3000 600 400 1000 500 1000 200 500 0 EJB 0 EJB V5R2 WAS 5.0 V5R2 WAS 5.1 JDBC 0 EJB JDBC V5R3 WAS 5.1 Power4 2 Way (LPAR) 1.1 Ghz Power5 2 Way 1.65 Ghz Power5 2 way (LPAR) 2.2 Ghz JDBC Power5 2 way (LPAR) 2.2 Ghz Power6 2 way (LPAR) 4.6 Ghz Figure 6.
PAGE 101
Primitive Name PingHtml PingServlet PingServletWriter PingServlet2Include PingServlet2Servlet PingJSP PingJSPEL PingServlet2JSP PingHTTPSession1 PingHTTPSession2 PingHTTPSession3 PingJDBCRead PingJDBCWrite PingServlet2JNDI PingServlet2SessionEJB PingServlet2EntityEJBLocal PingServlet2EntityEJBRemote PingServlet2Session2Entity PingServlet2Session2 EntityCollection PingServlet2Session2CMROne 2One PingServlet2Session2CMROne 2Many PingServlet2MDBQueue PingServlet2MDBTopic PingServlet2TwoPhase Descriptio
PAGE 102
PingHtml © Copyright IBM Corp. 2008 Chapter 6 - Web Server and WebSphere PingServlet2TwoPhase PingServlet2MDBTopic PingServlet2MDBQueue PingServlet2Session2CMR1-M PingServlet2Session2CMR1-1 PingServlet2Sess2 EntityColl.
PAGE 103
Accelerator for System i Transaction per second Coinciding with the release of i5/OS V5R4, IBM introduces new entry IBM System i models. The models introduce accelerator technologies and/or L3 cache in order to improve options for clients in the low-end server space. As an overview, the Accelerator for System i affects two 520 Models: (1) 600 CPW with no L3 cache and (2) 1200 CPW with L3 cache.
PAGE 104
Figure 6.6 provides insight into response time information regarding low-end System i models. There are two key concepts that are displayed in the data in Figure 6.6. The first is that Accelerator for System i models can provide substantially better response times than previous models for a single or many users. The 600 CPW accelerated to 3100 CPW reduces the response time by 5 times while the 1200 CPW accelerated to 3800 CPW reduces the response time by 2.5 times.
PAGE 105
Performance Considerations When Using WebSphere Transaction Processing (XA) In a general sense, a transaction is the execution of a set of related operations that must be completed together. This set of operations is referred to as a unit-of-work. A transaction is said to commit when it completes successfully. Otherwise it is said to roll back.
PAGE 106
Restriction: You cannot benefit from the one-phase commit optimization in the following circumstances: y If your application uses a reliability attribute other than assured persistent for its JMS messages. y If your application uses Bean Managed Persistence (BMP) entity beans, or JDBC clients. Before you configure your system, ensure that you consider all of the components of your J2EE application that might be affected by one-phase commits.
PAGE 107
6.4 IBM WebFacing The IBM WebFacing tool converts your 5250 application DDS display files, menu source, and help files into Java Servlets, JSPs, JavaBeans, and JavaScript to allow your application to run in either WebSphere Application Server V5 or V4. This is an easy way to bring your application to either the Internet, or the Intranet, both quickly and inexpensively.
PAGE 108
details on the number of I/O fields for each of these workloads. We ran the workloads on three separate machines (see table 6.5) to validate the performance characteristics with regard to CPW. In our running of the workloads, we tolerated only a 1.5 second server response time per panel. This value does not include the time it takes to render the image on the client system, but only the time it took the server to get the information to the client system. The machines that we used are in Table 6.
PAGE 109
• (Advanced Edition Only) Struts-compliant code generated by the WebFacing Tool conversion process which sets the foundation for extending your Webfaced applications using struts-compliant action architecture • Automatic configuration for UTF-8 support when you deploy to WebSphere Application Server version 5.0 • Support for function keys within window records • Enhanced hyperlink support • Improved memory optimization for record I/O processing.
PAGE 110
When set to an appropriate level for the Webfaced application, the Record Definition Cache can provide a decrease in memory usage, and slightly decreased processor usage. The number of record definitions that the cache will retain is set by an initialization parameter in the Webfaced application’s deployment descriptor (web.xml). By changing the cache size, the Webfaced application can be tuned for best performance and minimum memory requirements.
PAGE 111
To enable the servlet that will display the contents of the cache, first add the following segments to the Webfaced application’s web.xml. CacheDumper CacheDumper com.ibm.etools.iseries.webfacing.diags.
PAGE 112
Button Reset Counters Set Limit Refresh Clear Cache Save List Operation Resets the cache hit and miss counters back to 0. Temporarily sets the cache limit to a new value. Setting the value lower than the current value will cause the cache to be cleared as well. Refresh the display of cache elements. Drop all the cached definitions. Save a list of all the cached record data definitions. This list is saved in the RecordJSPs directory of the Webfaced application.
PAGE 113
Refer to the following table for the functionality provided by the Record Definition Loader servlet. Record Definition Loader Button operations Button Operation Infer from JSP This will cause the loader servlet to infer record Names definition names from the names or the JSP's contained in the RecordJsps directory. It will not find all the record definitions but it will get most of them. Load from File This option will load the record definitions listed in a file in the RecordJSPs directory.
PAGE 114
WebSphere Application Server. On System i servers, the recommended WebSphere application configuration is to run Apache as the web server and WebSphere Application Server as the application server. Therefore, it is recommended that you configure HTTP compression support in Apache. However, in certain instances HTTP compression configuration may be necessary using the Webfacing/WebSphere Application Server support. This is discussed below. The overall performance in both cases is essentially equivalent.
PAGE 115
You also need to add the directive: SetOutputFilter DEFLATE to the container to be compressed, or globally if the compression can always be done. There is documentation on the Apache website on mod_deflate (http://httpd.apache.org/docs-2.0/mod/mod_deflate.html) that has information specific to setting up for compression. That is the best place to look for details. The LoadModule and SetOutputFilter directives are required for mod_deflate to work.
PAGE 116
PartnerWorld for Developers Webfacing website: http://www.ibm.com/servers/enable/site/ebiz/webfacing/index.html IBM WebFacing Tool Performance Update - This white paper expains how to help optimize WebFaced Applications on IBM System i servers. Requests for the paper require user registration; there are no charges. http://www-919.ibm.com/servers/eserver/iseries/developer/ebiz/documents/webfacing/ IBM i 6.1 Performance Capabilities Reference - January/April/October 2008 © Copyright IBM Corp.
PAGE 117
6.5 WebSphere Host Access Transformation Services (HATS) WebSphere Host Access Transformation Services (HATS) gives you all the tools you need to quickly and easily extend your legacy applications to business partners, customers, and employees. HATS makes your 5250 applications available as HTML through the most popular Web browsers, while converting your host screens to a Web look and feel.
PAGE 118
customization requires development effort, while Default Rendering requires minimal development resources. Default: The screens in the application’s main path are unchanged. Moderate: An average of 30% of the screens have been customized. Advanced: All screens have been customized. HATS Customization (CPW/User) 6 CPW/User 5 4 Default 3 Moderate 2 Advanced 1 0 Application1 Application2 IBM i 6.1 Performance Capabilities Reference - January/April/October 2008 © Copyright IBM Corp.
PAGE 119
IBM Systems Workload Estimator for HATS The purpose of the IBM Systems Workload Estimator (WLE) is to provide a comprehensive System i sizing tool for new and existing customers interested in deploying new emerging workloads standalone or in combination with their current workloads. The Estimator recommends the model, processor, interactive feature, memory, and disk resources necessary for a mixed set of workloads. WLE was enhanced to support sizing a System i server to meet your HATS workload requirements.
PAGE 120
requirements do not take into account the requirement for other web applications, such as customer applications. You should use IBM Systems Workload Estimator (http://www-912.ibm.com/wle/EstimatorServlet) to determine the system requirements for additional web applications. IBM i 6.1 Performance Capabilities Reference - January/April/October 2008 © Copyright IBM Corp.
PAGE 121
6.7 WebSphere Portal The IBM WebSphere Portal suite of products enables companies to build a portal website serving the individual needs of their employees, business partners and customers. Users can sign on to the portal and view personalized web pages that provide access to the information, people and applications they need. This personalized, single point of access to resources reduces information overload, accelerates productivity and increases website usage.
PAGE 122
6.9 WebSphere Commerce Payments Use the IBM Systems Workload Estimator to predict the capacities and resource requirements for WebSphere Commerce Payments. The Estimator allows you to predict a standalone WCP environment or a WCP environment associated with the buy visits from a WebSphere Commerce estimation. Work with your marketing representative to utilize this tool. You’ll find the tool at: http://www.ibm.com/eserver/iseries/support/estimator.
PAGE 123
of access mechanisms. Please see the Connect for iSeries white paper located at the following URL for more information on Connect for iSeries. http://www-1.ibm.com/servers/eserver/iseries/btob/connect/pdf/whtpaperv11.pdf “B2B New Order Request” Workload Description: This workload is driven by a program that runs on a client work station that simulates multiple Web users.
PAGE 124
1. Connector relative capacity: The different back-end connector types are meant to allow users a simple way to connect the Connect for iSeries product to their back-end application. Your choice in a connector type may be dictated by several factors. Clearly, one of these factors relate to your existing back-end application and the programming language it is written in. This, in itself, may limit your choice for a back-end connector type.
PAGE 125
Chapter 7. Java Performance Highlights: y y y y y y y y Introduction What’s new in V6R1 IBM Technology for Java (32-bit and 64-bit) Classic VM (64-bit) Determining Which JVM to Use Capacity Planning Tips and Techniques Resources 7.1 Introduction Beginning in V5R4, IBM began a transition to a new VM implementation for i5/OS, IBM Technology for Java, to replace the Classic VM.
PAGE 126
option for Java applications which require large amounts of memory. The Classic VM remains available in V6R1, but future i5/OS releases are expected to support only IBM Technology for Java. The default VM in V6R1 is IBM Technology for Java 5.0, 32-bit. Other supported versions of IBM Technology for Java include 5.0 64-bit, 6.0 32-bit, and 6.0 64-bit. (6.0 versions will require the latest PTFs to be loaded.) The Classic VM supports Java versions 1.4, 5.0, and 6.0. In V5R4, the default VM is Classic 1.4.
PAGE 127
On i5/OS, IBM Technology for Java runs in i5/OS Portable Application Solutions Environment (i5/OS PASE) with either a 32-bit (for the 32-bit VM) or 64-bit (for the 64-bit VM) environment. Due to sophisticated memory management, both the 32-bit and 64-bit VMs provide a significant reduction in memory requirements over the Classic VM for most applications.
PAGE 128
Fortunately, it is not too difficult to come up with parameter values which will provide good performance. If you are moving an application from the Classic VM to IBM Technology for Java, you can use a tool like DMPJVM or verbose GC to determine how large the heap grows when running your application. This value can be used as the maximum heap size for 64-bit IBM Technology for Java; in 32-bit IBM Technology for Java, about 75% of this value is a reasonable starting point.
PAGE 129
performance, it pays to apply analysis and optimizations to the Java bytecodes, and the resulting machine code. One approach to optimizing Java bytecode involves analyzing the object code “ahead of time” – before it is actually running. This “ahead-of-time” (AOT) compiler technology was used exclusively by the original AS/400 Java Virtual Machine, whose success proved the power of such an approach.
PAGE 130
applications with a large number of classes. Running CRTJVAPGM with OPTIMIZE(*INTERPRET) will create this program ahead of time, making the first startup faster. Garbage Collection Java uses Garbage Collection (GC) to automatically manage memory by cleaning up objects and memory when they are no longer in use. This eliminates certain types of memory leaks which can be caused by application bugs for applications written in other languages.
PAGE 131
display; rates of 20 to 30 faults per second are usually acceptable, but larger values may indicate a performance problem. In this case, the size of the memory pool should be increased, or the collection threshold value (GCHINL or -Xms) should be decreased so the heap isn’t allowed to grow as large. In many cases the scenario may be complicated by the fact that multiple applications may be running in the same memory pool.
PAGE 132
later releases the cache is enabled and the maxpgms set to 20000 by default, so no adjustment is usually necessary. The verification cache operates by caching JVAPGMs that have been dynamically created for dynamically loaded classes. When the verification cache is not operating, these JVAPGMs are created as temporary objects, and are deleted as the JVM shuts down.
PAGE 133
libraries and environments may require a particular version. The Classic VM continues to support JDK 1.3, 1.4, 1.5 (5.0), and 1.6 (6.0) in V5R4, and JDK 1.4, 1.5 (5.0), and 1.6 (6.0) in V6R1. 3. The Classic VM supported an i5/OS-specific feature called Adopted Authority. IBM Technology for Java does not support this feature, so applications which require Adopted Authority must run in the Classic VM. This will not affect most applications.
PAGE 134
application itself or a reasonably complete subset of the application, using a load generating tool to simulate a load representative of your planned deployment environment. WebSphere applications running with IBM Technology for Java will be subject to the same constraints as plain Java applications; however, there are some considerations which are specific to WebSphere, as described in Chapter 6 (Web Server and WebSphere Performance). 7.
PAGE 135
y Beware of misleading benchmarks. Many benchmarks are available to test Java performance, but most of these are not good predictors of server-side Java performance. Some of these benchmarks are single-threaded, or run for a very short period of time. Others will stress certain components of the JVM heavily, while avoiding other functionality that is more typical of real applications. Even the best benchmarks will exercise the JVM differently than real applications with real data.
PAGE 136
4. Database Specific. Use of database can invoke significant path length in i5/OS. Invoking it efficiently can maximize the performance and value of a Java application. i5/OS Specific Java Tips and Techniques y Load the latest CUM package and PTFs To be sure that you have the best performing code, be sure to load the latest CUM packages and PTFs for all products that you are using.
PAGE 137
does take advantage of programs created at optimization *INTERPRET. These programs require significantly less space and do not need to be deleted. Program objects (even at *INTERPRET) are not used by IBM Technology for Java. y Consider the special property os400.jit.mmi.threshold. This property sets the threshold for the MMI of the JIT. Setting this to a small value will result is compilation of the classes at startup time and will increase the start up time.
PAGE 138
y The I/O method readLine( ) (e.g. in java.io.BufferedReader) will create a new String. y String concatenation (e.g.: “The value is: “ + value) will generally result in creation of a StringBuffer, a String, and a character array. y Putting primitive values (like int or long) into a collection (like List or Map) requires wrapping it in a new object (e.g. Java.lang.Integer). This is usually obvious in the code, but Java 5.
PAGE 139
int i = 0; try { while (true) { System.out.println (arr[i++]); } } catch (ArrayOutOfBoundsException e) { // Reached the end of the array....exit } } Instead, the above procedure should be written as: public void goodPrintArray (int arr[]) { int len = arr.length; for (int i = 0; i < len; i++) { System.out.println (arr[i]); } } In the “bad” version of this code, an exception will always be thrown (and caught) in every execution of the method.
PAGE 140
applications. The Toolbox driver supports remote access, and should be used when accessing the database on a separate system. This recommendation is true for both the 64-bit Classic VM and the new 32-bit VM. y Pool Database Connections Connection pooling is a technique for sharing a small number of database connections among a number of threads.
PAGE 141
Resources The i5/OS Java and WebSphere performance team maintains a list of performance-related documents at http://www.ibm.com/systems/i/solutions/perfmgmt/webjtune.html. The Java Diagnostics Guide provides detailed information on performance tuning and analysis when using IBM Technology for Java. Most of the document applies to all platforms using IBM’s Java VM; in addition, one chapter is written specifically for i5/OS information. The Diagnostics Guide is available at http://www.ibm.
PAGE 142
Chapter 8. Cryptography Performance With an increasing demand for security in today’s information society, cryptography enables us to encrypt the communication and storage of secret or confidential data. This also requires data integrity, authentication and transaction non-repudiation. Together, cryptographic algorithms, shared/symmetric keys and public/private keys provide the mechanisms to support all of these requirements.
PAGE 143
CSP API Sets User applications can utilize cryptographic services indirectly via i5/OS functions (SSL/TLS, VPN IPSec) or directly via the following APIs: y y y y y y The Common Cryptographic Architecture (CCA) API set is provided for running cryptographic operations on a Cryptographic Coprocessor. The i5/OS Cryptographic Services API set is provided for running cryptographic operations within the Licensed Internal Code.
PAGE 144
8.3 Software Cryptographic API Performance This section provides performance information for System i systems using the following cryptographic services; i5/OS Cryptographic Services API and IBM JCE 1.2.1, an extension of JDK 1.4.2. Cryptographic performance is an important aspect of capacity planning, particularly for applications using secure network communications. The information in this section may be used to assist in capacity planning for this complex environment.
PAGE 145
Table 8.2 Signing Performance Encryption RSA Key Length Threads Algorithm (Bits) SHA-1 / RSA 1 1024 SHA-1 / RSA 10 1024 SHA-1 / RSA 1 2048 SHA-1 / RSA 10 2048 Notes: y Transaction Length set at 1024 bytes y See section 8.2 for Test Environment Information i5/OS (Transactions/Second) 901 1,155 129 163 JCE (Transactions/Second) 197 240 30 35 Table 8.
PAGE 146
which is designed to meet FIPS 140-2 Level 4 security requirements. This new cryptographic card offers the security and performance required to support e-Business and emerging digital signature applications. For banking and finance applications the 4764 Cryptographic Coprocessor delivers improved performance for T-DES, RSA, and financial PIN processing.
PAGE 147
Table 8.5 Signing Performance CCA CSP RSA Key Length (Bits) 1024 1024 2048 2048 Encryption Threads Algorithm SHA-1 / RSA 1 SHA-1 / RSA 10 SHA-1 / RSA 1 SHA-1 / RSA 10 Notes: y Transaction Length set at 1024 bytes y See section 8.2 for Test Environment information 4764 (Transactions/second) 794 1,074 308 465 Table 8.6 Financial PINs Performance CCA CSP Threads Total Repetitions 1 10 10000 100000 4764 (Transactions/second) 945 966 Notes: y See section 8.2 for Test Environment information 8.
PAGE 148
y Supported number of 4764 Cryptographic Coprocessors: Table 8.8 server models IBM System i5 570 8/12/16W, 595 IBM System i5 520, 550, 570 2/4W Maximum per server 32 8 Maximum per partition 8 8 y Applications requiring a FIPS 140-2 Level 4 certified, tamper resistant module for storing cryptographic keys should use the IBM 4764 Cryptographic Coprocessor. y Cryptographic functions demand a lot of a system CPU, but the performance does scale well when you add a CPU to your system.
PAGE 149
Chapter 9. iSeries NetServer File Serving Performance This chapter will focus on iSeries NetServer File Serving Performance. 9.1 iSeries NetServer File Serving Performance iSeries Support for Windows Network Neighborhood (iSeries NetServer) supports the Server Message Block (SMB) protocol through the use of Transmission Control Protocol/Internet Protocol (TCP/IP) on iSeries. This communication allows clients to access iSeries shared directory paths and shared output queues.
PAGE 150
Measurement Results: Throughput 250.000 MBits/second 200.000 150.000 V5R2 V5R3 V5R4 100.000 50.000 0.000 1 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 Clients Response Time 12.000 Milliseconds 10.000 8.000 V5R2 6.000 V5R3 V5R4 4.000 2.000 0.000 1 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 Clients Conclusion/Explanations: environment can be obtained by sending an email to llhirsch@us.ibm.com. IBM i 6.
PAGE 151
From the charts above in the Measurement Results section, it is evident that when customers upgrade to V5R4 they can expect to see an improvement in throughput and response time when using iSeries NetServer. IBM i 6.1 Performance Capabilities Reference - January/April/October 2008 © Copyright IBM Corp.
PAGE 152
Chapter 10. DB2 for i5/OS JDBC and ODBC Performance DB2 for i5/OS can be accessed through many different interfaces. Among these interfaces are: Windows .NET, OLE DB, Windows database APIs, ODBC and JDBC. This chapter will focus on access through JDBC and ODBC by providing programming and tuning hints as well as links to detailed information. 10.
PAGE 153
y Use the lowest isolation level required by the application. Higher isolation levels can reduce performance levels as more locking and synchronization are required. Transaction levels in order of increasing level are: TRANSACTION_NONE, TRANSACTION_READ_UNCOMMITTED, TRANSACTION_READ_COMMITTED, TRANSACTION_REPEATABLE_READ, TRANSACTION_SERIALIZABLE y Reuse connections. Minimize the opening and closing of connections where possible. These operations are very expensive.
PAGE 154
y y y y y Employ efficient SQL programming techniques to minimize the amount of data processed Prepared statement reuse to minimize parsing and optimization overhead for frequently run queries Use stored procedures when appropriate to bundle processing into fewer database requests Consider extended dynamic package support for SQL statement and package caching Process data in blocks of multiple rows rather than single records when possible (e.g.
PAGE 155
Packages may be shared by several clients to reduce the number of packages on the System i server. To enable sharing, the default libraries of the clients must be the same and the clients must be running the same application. Extended dynamic support will be deactivated if two clients try to use the same package but have different default libraries.
PAGE 156
‘All libraries on the system’ will cause all libraries on the system to be used for catalog requests and may cause significant degradation in response times due to the potential volume of libraries to process. References for ODBC y DB2 Universal Database for System i SQL Call Level Interface (ODBC) is found under the System i Information Center under Printable PDFs and Manuals y The System i Information Center Http://publib.boulder.ibm.com/iseries/ y Microsoft ODBC webpage http://msdn2.microsoft.
PAGE 157
Chapter 11. Domino on i This chapter includes performance information for Lotus Domino on the IBM i operating system. Some of the information previously included in this section has been removed. Earlier versions of the document can be accessed at http://www.ibm.com/systems/i/solutions/perfmgmt/resource.html April 2008 Update: y Workload Estimator 2008.2 January 2008 Updates: y V6R1 y Domino 8 white papers y Workload Estimator 2008.
PAGE 158
y IBM Lotus Domino V8 server with the IBM Lotus Notes V8 client: Performance, October 2007 http://www.ibm.com/developerworks/lotus/library/domino8-performance/index.html y Lotus Domino 7 Server Performance, Part 2, November 2005 http://www.ibm.com/developerworks/lotus/library/domino7-internet-performance/index.html y Lotus Domino 7 Server Performance, Part 3, November 2005 http://www.ibm.
PAGE 159
y Delete documents marked for deletion Create 1 appointment (every 90 minutes) Schedule 1 meeting invitation (every 90 minutes) Close the view Domino Web Access (formerly known as iNotes Web Access) Each user completes the following actions an average of every 15 minutes except where noted: Open mail database which contains documents that are 10Kbytes in size.
PAGE 160
optimal performance but of course without the function provided in the Domino 7 templates. The following links refer to these articles: y Lotus Domino 7 Server Performance, Part 1, September 2005 http://www.ibm.com/developerworks/lotus/library/nd7-perform/index.html y Lotus Domino 7 Server Performance, Part 2, November 2005 http://www.ibm.com/developerworks/lotus/library/domino7-internet-performance/index.html y Lotus Domino 7 Server Performance, Part 3, November 2005 http://www.ibm.
PAGE 161
Domino Version Number of Domino Web Access users Average CPU Utilization Average Response Time Average Disk Utilization Domino 5.0.11 Domino 6 2,000 2,000 41.5% 24.0% 96ms 64ms <1% <1% Domino 5.0.11 Domino 6 3,800 3,800 19.4% 11.0% 119ms 65ms <1% <1% Domino 5.0.11 Domino 6 20,000 20,000 96.2% 51.5% >5sec 72ms <1% <1% The 3000 user comparison above was done on an iSeries model i270-2253 which has a 2-way 450MHz processor.
PAGE 162
The 2000 user comparison was done on a model i825-2473 with 6 1.1GHz POWER4 processors, 45GB of memory, and 60 18GB disk drives configured with RAID5, in a single Domino partition. The 3800 user comparison used a single Domino partition on a model i890-0198 with 32 1.3GHz POWER4 processors. This system had 64GB of memory and 89 18GB disk drives configured with RAID5 protection. The 20,000 user comparison used ten Domino partitions, also on an i890-0198 32-way system with 1.3GHz POWER4 processors.
PAGE 163
shopping application, but would provide even better response times than the 270-2423 as projected in Figure 11.3. When using MHz alone to compare performance capabilities between models, it is necessary for those models to have the same processor technology and configuration. Factors such as L2 cache and type and speed of memory controllers also influence performance behavior and must be considered.
PAGE 164
The eServer i5 Domino Edition builds on the tradition of the DSD (Dedicated Server for Domino) and the iSeries for Domino offering - providing great price/performance for Lotus software on System i5 and i5/OS. Please visit the following sites for the latest information on Domino Edition solutions: y y http://www.ibm.com/servers/eserver/iseries/domino/ http://www.ibm.com/servers/eserver/iseries/domino/edition.html 11.7 Performance Tips / Techniques 1.
PAGE 165
that the larger the buffer pool size, the higher the fault rate, but the lower the cpu cost. If the faulting rate looks high, decrease the buffer pool size. If the faulting rate is low but your cpu utilization is high, try increasing the buffer pool size. Increasing the buffer pool size allocates larger objects specifically for Domino buffers, thus increasing storage pool contention and making less storage available for the paging/faulting of other objects on the system.
PAGE 166
7. Full text indexes Consider whether to allow users to create full text indexes for their mail files, and avoid the use of them whenever possible. These indexes are expensive to maintain since they take up CPU processing time and disk space. 8. Replication.
PAGE 167
11.8 Domino Web Access The following recommendations help optimize your Domino Web Access environment: 1. Refer to the redbooks listed at the beginning of this chapter. The redbook, “iNotes Web Access on the IBM eServer iSeries server,” contains performance information on Domino Web Access including the impact of running with SSL. 2. Use the default number of 40 HTTP threads. However, if you find that the Domino.Threads.Active.Peak is equal to Domino.Threads.
PAGE 168
11.10 Performance Monitoring Statistics Function to monitor performance statistics was added to Domino Release 5.0.3. Domino will track performance metrics of the operating system and output the results to the server. Type "show stat platform" at the server console to display them. This feature can be enabled by setting the parameter PLATFORM_STATISTICS_ENABLED=1 in the NOTES.INI file and restarting your server and is automatically enabled in some versions of Domino.
PAGE 169
2. *MINIMIZE The main storage will be allocated to minimize the space used by the object. That is, as little main storage as possible will be allocated and used. This minimizes main storage usage while increasing the number of disk I/O operations since less information is cached in main storage. 3. *DYNAMIC The system will dynamically determine the optimum main storage allocation for the object depending on other system activity and main storage contention.
PAGE 170
The following is an example of how to issue the command: CHGATR OBJ( name of object) ATR(*MAINSTGOPT) VALUE(*NORMAL, *MINIMIZE, or *DYNAMIC) The chart below depicts V5R3-based paging curve measurements performed with the following settings for the mail databases: *NORMAL, *MINIMIZE, and *DYNAMIC.
PAGE 171
During the tests, the *DYNAMIC and *MINIMIZE settings used up to 5% more CPU resource than *NORMAL. Figure 11.5 below shows the response time data rather than fault rates for the same test shown in Figure 11.4 for the attributes *NORMAL, *DYNAMIC, and *MINIMIZE. V5R3 Main Storage Options Response Times AVERAGE RESPONSE TIME (ms) 80 70 60 50 40 30 20 10 0 60775040 47026568 36388264 28156548 21787000 BASE POOL SIZE(KB) V5R3 *DYNAMIC V5R3 *NORMAL V5R3 *MINIMIZE Figure 11.
PAGE 172
NOTE: MCU ratings should NOT be used directly as a sizing guideline for the number of supported users. MCU ratings provide a relative comparison metric which enables System i models to be compared with each other based on their Domino processing capability. MCU ratings are based on an industry standard workload and the simulated users do not necessarily represent a typical load exerted by “real life” Domino users.
PAGE 173
users or relatively low transaction rates, response times may be significantly higher for a small LPAR (such as 0.2 processor) or partial processor model as compared to a full processor allocation of the same technology. The IBM Systems Workload Estimator will not recommend the 500 CPW or 600 CPW models for Domino processing. Be sure to read the section “Accelerator for System i5” in Chapter 6, Web Server and WebSphere Performance.
PAGE 174
Chapter 12. WebSphere MQ for iSeries 12.1 Introduction The WebSphere MQ for iSeries product allows application programs to communicate with each other using messages and message queuing. The applications can reside either on the same machine or on different machines or platforms that are separated by one or more networks.
PAGE 175
enhancement should allow customers to run with smaller, more manageable, receivers with less concern about the checkpoint taken following a receiver roll-over during business hours. 12.3 Test Description and Results Version 5.3 of WebSphere MQ for iSeries includes several performance enhancements designed to significantly improve queue manager throughput and application response time, as well as improve the overall throughput capacity of MQ Series.
PAGE 176
applications using MQ Series are running, you may need to consider adding memory to these pools to help performance. y Nonpersistent messages use significantly less CPU and IO resource than persistent messages do because persistent messages use native journaling support on the iSeries to ensure that messages are recoverable. Because of this, persistent messages should not be used where nonpersistent messages will be sufficient.
PAGE 177
Chapter 13. Linux on iSeries Performance 13.1 Summary Linux on iSeries expands the iSeries platform solutions portfolio by allowing customers and software vendors to port existing Linux applications to the iSeries with minimal effort. But, how does it shape up in terms of performance? What does it look like generally and from a performance perspective? How can one best configure an iSeries machine to run Linux? Key Ideas y y y y y y y y y "Linux is Linux.
PAGE 178
y y Shared Processors. This variation of LPAR allows the Hypervisor to use a given processor in multiple partitions. Thus, a uni-processor might be divided in various fractions between (say) three LPAR partitions. A four way SMP might give 3.9 CPUs to one partition and 0.1 CPUs to another. This is a large and potentially profitable subject, suitable for its own future paper. Imagine consolidating racks of old, under utilized servers to several partitions, each with a fraction of an iSeries CPU driving it.
PAGE 179
iSeries Linux is a program-execution environment on the iSeries system that provides a traditional memory model (not single-level store) and allows direct access to machine instructions (without the mapping of MI architecture). Because they run in their own partition on a Linux Operating System, programs running in iSeries Linux do have direct access to the full capabilities of the user-state and even most supervisor state architecture of the original PowerPC architecture.
PAGE 180
13.4 Basic Configuration and Performance Questions Since, by definition, iSeries Linux means at least two independent partitions, questions of configuration and performance get surprisingly complicated, at least in the sense that not everything is on one operating system and whose overall performance is not visible to a single set of tools. Consider the following environments: y y A machine with a Linux and an OS/400 partition, both running CPU-bound work with little I/O.
PAGE 181
13.5 General Performance Information and Results A limited number of performance related tests have been conducted to date, comparing the performance of iSeries Linux to other environments on iSeries and to compare performance to similarly configured (especially CPU MHz) pSeries running the application in an AIX environment. Computational Performance -- C-based code A factor not immediately obvious is that most Linux and Open Source code are constructed with a single compiler, the GNC (gcc or g++) compiler.
PAGE 182
Relative Performance (Bigger Better) Fraction of ILE Performance 1.2 1 0.8 Integer Floating Point 0.6 0.4 0.2 0 Linux ILE PASE Computational Environment One virtue of the i870, i890, and i825 machines is that the hardware floating point unit can make up for some of the code generation deficit due to its superior hardware scheduling capabilities. Computational Performance -- Java Generally, Java computational performance will be dictated by the quality of the JVM used.
PAGE 183
Here, a model 840 was subdivided into the partition sizes shown and a typical web serving load was used. A "hit" is one web page or one image. The kttpd is a kernel-based daemon available on Linux which serves only static web pages or images. It can be cascaded with ordinary Apache to provide dynamic content as well. The other is a standard Apache 1.3 installation. The 820 or 830 would be a bit less, by about 10 per cent, than the above numbers.
PAGE 184
As noted above, many distributions are based on the 2.95 gcc compiler. The more recent 3.2 gcc is also used by some distributions. Results there shows some variability and not much net improvement. To the extent it improves, the gap with ILE should close somewhat. Floating point performance is improved, but proportionately.
PAGE 185
y y Cost. Because the disk is virtual, it can be created to any size desired. For some kinds of Linux partitions, a single modern physical disk is overkill -- providing far more data than required. These requirements only increase if RAID, in particular, is specified. Here, the Network Storage object can be created to any desired size, which helps keep down the cost of the partition.
PAGE 186
typically recommended because it allows the Linux partitions to leverage the storage subsystem the customer has in the OS/400 hosting partition. 2. As the application gains in complexity, it is probably less likely that the application should switch from one product to the other. Such applications tend to implicitly play to particular design choices of their current product and there is probably not much to gain from moving them between products. 3.
PAGE 187
do so, you may wish to compare with the next previous version. This would be especially important if you have one key piece of open source code largely responsible for the performance of a given partition. There is no way of ensuring that a new distribution is actually faster than the predecessor except to test it out.
PAGE 188
substantial amount of Virtual I/O. This is probably on the high side, but can be important to have something left over. If the hosting partition uses all its CPU, Virtual I/O may slow substantially. y Use Virtual LAN for connections between iSeries partitions whether OS/400 or Linux. If your OS/400 PTFs are up to date, it performs roughly on a par with gigabit ethernet and has zero hardware cost, no switches and wires, etc. y Use Virtual Disk for disk function.
PAGE 189
Native and Virtual LAN (e.g. from outside the box on Native LAN, through the partition with the Native LAN, and then moving to a second partition via Virtual LAN then to another). IBM i 6.1 Performance Capabilities Reference - January/April/October 2008 © Copyright IBM Corp.
PAGE 190
Chapter 14. DASD Performance This chapter discusses DASD subsystems available for the System i platform. There are two separate considerations. Before IBM i operating system V6R1, one only had to consider particular devices, IOAs, IOPs, and SAN devices. All attached through similar strategies directly to IBM i operating system and were all supported natively. Starting in IBM iV6R1, however, IBM i operating system will be permitted to become a virtual client of an IBM product known as VIOS.
PAGE 191
14.1.0 Direct Attach (Native) 14.1.1 Hardware Characteristics 14.1.1.1 Devices & Controllers CCIN Codes Approximat e Size (GB) 6718 6719 4326 4327 4328 4329 433B 433C 433D CCIN Codes 18 35 35 70 140 280 70 140 280 RPM Read 10K 10K 15K 15K 15K 15K 15K 15K 15K (IOA) Feature Codes 4.9 4.7 3.6 3.6 3.6 3.6 3.5 3.5 3.
PAGE 192
14.1.2 iV5R2 Direct Attach DASD This section discusses the direct attach DASD subsystem performance improvements that were new with the iV5R2 release.
PAGE 193
14.1.2.
PAGE 194
14.1.3 571B iV5R4 offers two new options on DASD configuration. y RAID6 which offers improved system protection on supported IOAs. y NOTE: RAID6 is supported under iV5R3 but we have chosen to look at performance data on a iV5R4 system. y IOPLess operation on supported IOAs. 14.1.3.1 571B RAID5 vs RAID6 - 10 15K 35GB DASD System Responce Time (sec) RAID-5 vs RAID-6 . 571B RAID-5 571B RAID-6 0.25 0.2 0.15 0.1 0.05 0 0 500 1000 1500 2000 2500 3000 3500 Workload Throughput 14.1.3.
PAGE 195
14.1.4 571B, 5709, 573D, 5703, 2780 IOA Comparison Chart In the following two charts we are modeling a System i 520 with a 573D IOA using RAID5, comparing 3 70GB 15K RPM DASD to 4 70GB 15K RPM DASD. The 520 is capable of holding up to 8 DASD but many of our smaller customers do not need the storage. The charts try to point out that there may be performance considerations even when the space isn’t needed. System Response Time (sec) 14.1.4.1 573D 3 RAID5 DASD 573D 4 RAID5 DASD 0.5 0.4 0.3 0.2 0.
PAGE 196
The charts below are an attempt to allow the different IOAs available to be compared on a single chart. An I/O Intensive Workload was used for our throughput measurements. The system used was a 520 model with a single 5094 attached which contained the IOAs for the measurements. Note: the 5709 and 573D are cache cards for the built in IOA in the 520/550/570 CECs, even though I show them in the following chart like they are the IOA.
PAGE 197
14.1.5 Comparing Current 2780/574F with the new 571E/574F and 571F/575B NOTE: iV5R3 has support for the features in this section but all of our performance measurements were done on iV5R4 systems. For information on the supported features see the IBM Product Announcement Letters. A model 570 4 way system with 48 GB of mainstore memory was used for the following.
PAGE 198
14.1.6 Comparing 571E/574F and 571F/575B IOP and IOPLess In comparing IOP and IOPLess runs we did not see any significant differences, including the system CPU used. The system we used was a model 570 4 way, on the IOP run the system CPU was 11.6% and on the IOPLess run the system CPU was 11.5%. The 571E/574F and 571F/575B display similar characteristics when comparing IOP and IOPLess environments, so we have chosen to display results from only the 571E/574F. IOP compared to IOPLess 14.1.6.
PAGE 199
14.1.7 Comparing 571E/574F and 571F/575B RAID5 and RAID6 and Mirroring System i protection information can be found at http://www.redbooks.ibm.com/ in the current System i Handbook or the Info Center http://publib.boulder.ibm.com/iseries/ . When comparing RAID5, RAID6 and Mirroring we are interested in looking at the strength of failure protection vs storage capacity vs the performance impacts to the system workloads. A model 570 4 way system with 48 GB of mainstore memory was used for the following.
PAGE 200
In comparing Mirroring and RAID one of the concerns is capacity differences and the hardware needed. We tried to create an environment where the capacity was the same in both environments. To do this we built the same size database on “15 35GB DASD using RAID5” and “14 70GB DASD using Mirroring spread across 2 IOAs”. The protection in the Mirrored environment is better but it also has the cost of an extra IOA in this low number DASD environment.
PAGE 201
14.1.8 Performance Limits on the 571F/575B In the following charts we try to characterize the 571F/575B in different DASD configuration. The 15 DASD experiment is used to give a comparison point with DASD experiments from chart 14.1.5.1 and 14.1.5.2. The 18, 24 and 36 DASD configurations are used to help in the discussion of performance vs capacity. Our DASD IO workload scaled well from 15 DASD to 36 DASD on a single 571F/575B 14.1.8.
PAGE 202
14.1.9 Investigating 571E/574F and 571F/575B IOA, Bus and HSL limitations. With the new DASD controllers and IOPLess capabilities, IBM has created many new options for our customers. Customers who needed more storage in their smaller configurations can now grow. With the ability to add more storage into an HSL loop the capacity and performance have the potential to grow.
PAGE 203
© Copyright IBM Corp. 2008 Chapter 14 DASD Performance 6_Towers 18_571E_&_18_571F 918_DASD 5_Towers 15_571E_&_15_571F 765_DASD 4_Towers 12_571E_&_12_571F 612_DASD 3_Towers 9_571E_&_9_571F 459_DASD 2_Towers 6_571E_&_6_571F 306_DASD 1_Tower 3_571E_&_3_571F 153_DASD 1_Tower 153_DASD 3_571E & 3_571F 1_Tower 117_DASD 3_571E & 2_571F 1_Tower 81_DASD 3_571E & 1_571F Large Block READs on a Single 5094 Tower in an HSL Loop 14.1.9.1 14.1.9.
PAGE 204
14.1.10 Direct Attach 571E/574F and 571F/575B Observations We did some simple comparison measurements to provide graphical examples for customers to observe characteristics of new hardware. We collected performance data using Collection Services and Performance Explorer to create our graphs after running our DASD IO workload (small block reads and writes). IOP vs IOPLess: no measurable difference in CPU or throughput.
PAGE 205
14.2 New in iV5R4M5 System Response Time (sec) 14.2.1 9406-MMA CEC vs 9406-570 CEC DASD 9406-MMA 4 way 6 433B 70 GB DASD Mirrored "No Cache" 9406-570 4 way 6 4327 70 GB DASD Mirrored "No Cache" 9406-570 4 way 6 4327 70 GB DASD Mirrored "With Cache" 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.
PAGE 206
System Response Time (sec) 14.2.2 RAID Hot Spare 9406-570 4 way 24 4328 140 GB RAID5 24 active 9406-570 4 way 24 4328 140 GB RAID5 22 active 2 Hot Spares 9406-570 4 way 24 4328 140 GB RAID6 24 active 9406-570 4 way 24 4328 140 GB RAID6 22 active 2 Hot Spares 0.14 0.11 0.08 0.05 0.02 5000 6000 7000 8000 9000 10000 Workload Throughput 11000 12000 13000 For the following test, the IO workload was setup to run for 14 hours. About 5 hours after starting A DASD was pulled from the configurations.
PAGE 207
14.2.3 12X Loop Testing 12X Loop testing from 1 571F to 8 571F IOAs with 36 DASD off each 571F 1800 1600 1400 GB/HR 1200 1000 800 600 400 200 0 1 2 3 4 5 6 7 8 Number of 571F IOAs A 9406-MMA 8 Way system with 96 GB of mainstore and 396 DASD in #5786 EXP24 Disk Drawer on 3 12X loops for the system ASP were used, ASP 2 was created on a 4th 12X loop by adding 5796 system expansion units with 571F IOAs attaching 36 4327 70 GB DASD in #5786 EXP24 Disk Drawer with RAID5 turned on.
PAGE 208
14.3 New in iV6R1M0 14.3.1 Encrypted ASP More CPU and memory may be needed to achieve the same performance once encryption is enabled. Non Encrypted ASP vs Encrypted ASP 9406 MMA 4 Way 571F w ith 24 DASD Non Encrypted ASP System Response Time (sec) 9406 MMA 4 Way 571F w ith 24 DASD Encrypted ASP 0.2 0.15 0.1 0.
PAGE 209
Non Encrypted ASP vs Encrypted ASP 9406 MMA 4 Way 571F with 24 DASD Non Encrypted ASP 9406 MMA 4 Way 571F with 24 DASD Encrypted ASP 25 20 CPU 15 10 5 0 6000 7300 8600 9800 Workload Throughput IBM i 6.1 Performance Capabilities Reference - January/April/October 2008 © Copyright IBM Corp.
PAGE 210
14.3.2 57B8/57B7 IOA With the addition of the POWER6 520 and 550 systems comes the new 57B8/57B7 SAS Raid Ennoblement Controller with Auxiliary Write Cache. This controller is only available in the POWER6 520 and 550 systems and provides RAID5/6 capabilities, with 175MB redundant write cache. Below are some charts comparing the Storage Controllers for the POWER5 570 (573D), which can be either mirrored or RAID5 protected.
PAGE 211
The POWER6 520 and 550 also have an external SAS port, that is controlled by the 57B8/57B7, used to connect a single #5886 - EXP 12S SAS Disk Drawer which can contain up to 12 SAS DASD. Below is a chart showing the addition of the #5886 - EXP 12S SAS Disk Drawer. POWER6 520 57B8/57B7 6 RAID5 DASD in CEC 12 RAID5 DASD in EXP 12S SAS Disk Drawer POWER6 520 57B8/57B7 6 RAID5 DASD in CEC System Response Time (sec) 0.25 0.2 0.15 0.1 0.
PAGE 212
14.3.3 572A IOA The 572A IOA is a SAS IOA that is mainly used for SAS tape attachment but the 5886 EXP 12S SAS Disk Drawer can also be attached. Performance will be poor as the IOA does not have any cache. The following charts help to show the performance characteristics that resulted during experiments in the Rochester lab.
PAGE 214
14.4 SAN - Storage Area Network (External) There are many factors to consider when looking at external storage options, you can get more information through your IBM representative and the white papers that are available at the following location. https://www-304.ibm.com/systems/support/ IBM i 6.1 Performance Capabilities Reference - January/April/October 2008 © Copyright IBM Corp.
PAGE 215
14.5 iV6R1M0 -- VIOS and IVM Considerations Beginning in iV6R1M0, IBM i operating system will participate in a new virtualization strategy by becoming a client of the VIOS product. Customers will view the VIOS product two different ways: y y On blade products, through the regular configuration tool IVM (which includes an easy to use interface to VIOS). On traditional (non-blade) products, through a combination of HMC and the VIOS command line.
PAGE 216
14.5.1 General VIOS Considerations 14.5.1.1 Generic Concepts 520 versus 512. Long time IBM i operating system users know that IBM i operating system disks are traditionally configured with 520 byte sectors. The extra eight bytes beyond the 512 used for data are used for various purposes by Single Level Store. For a variety of reasons, VIOS will always surface 512 byte sectors to IBM i operating system whatever the actual sector size of the disk may be.
PAGE 217
14.5.1.2 Generic Configuration Concepts There are several important principles to keep track of in terms of getting good performance. Most of the following are issues when the disks are configured. A great many problems can be eliminated (or, created) when the drives are originally configured. The exact nature of some of these difficulties might not be easily predicted. But, much of what follows will simply avoid trouble at no other cost. 1.
PAGE 218
3. Prefer external disks attached directly to IBM i operating system over those attached via VIOS This is basically a statement of the Fibre Channel adapter and who owns it. In some cases, it affects which adapter is purchased. If you do not need to share a given external disk's resources with non-IBM i operating system partitions, and the support is available, avoiding VIOS altogether will give better performance. First, the disks will usually have 520 byte support.
PAGE 219
8. Ensure, within reason, a reasonable number of virtual disks are created and made available to IBM i operating system. One is tempted to simply lump all the storage one has in a virtual environment into a couple (or even one) large virtual disk. Avoid this if at all possible. For traditional (non-blade) systems: There is a great deal of variability here, so generalizations are difficult. However, in the end, favor virtual disks that are within a binary order of magnitude or two of the physical disk sizes.
PAGE 220
14.5.1.3 Specific VIOS Configuration Recommendations -- Traditional (non-blade) Machines 1. Avoid volume groups if possible. VIOS "hdisks" must have a volume identifier (PVID). Creating a volume group is an easy way to assign one and some literature will lead you to do it that way.
PAGE 221
3. Limited number of virtual devices per virtual SCSI adapter. You will have to configure some number of virtual SCSI adapters so that VIOS can provide a path for IBM i operating system to talk to VIOS as if these were really physical SCSI devices. These adapters, in turn, implement some existing rules, so that only 16 virtual disks can be made part of a given virtual adapter. You probably would not want to exceed this limit anyway.
PAGE 222
14.5.1.3 VIOS and JS12 Express and JS22 Express Considerations Most of our work consisted of measurements with the JS22 offering and external disks using the DS4800 product. The following are results obtained in various measurements and then a few general comments about configuration will follow. 14.5.1.3.
PAGE 223
VIOS/IBM i operating system JS22 Express DS4800 (90 DDMs) Commercial Performance Workload IBM i operating system .8 Processor VIOS .2 Processor IBM i operating system 1.7 Processor VIOS .3 Processor IBM i operating system 2.6 Processors VIOS .4 Processors IBM i operating system 3.5 Processors VIOS .5 Processors 10 Response Time (sec) 1 0.1 0.01 0.001 0 10000 20000 30000 40000 Transactions/Minute 50000 60000 The chart above shows some basic performance scaling for 1, 2, 3 and 4 processors.
PAGE 224
The following charts are a view of the characteristics we observed during our Commercial Performance Workload testing on our JS22 Express. The first chart shows the effect on the Commercial Performance Workload when we apply 3 Dedicated processors and then switch to 3 shared processors. Then incremented the number of virtual processors available. The “red line” is our dedicated processor set up, which is our baseline.
PAGE 225
In following single partition Commercial Performance Workload runs the average VIOS CPU stayed under 40%. So we seem to have VIOS resource available but in a lot of customer environments communications and other resources are also running and these resources will also be routed through VIOS.
PAGE 226
The following chart shows two IBM i operating system partitions using 14GB of memory and 1.7 processors each served by 1 VIOS partition using 2GB of memory and .6 processors. The Commercial Performance Workload was running the same amount of transactions on each of the partitions for the same time intervals. Although there is an observed cost for VIOS to manage multiple partitions, VIOS was able to balance services to the two partitions.
PAGE 227
14.5.1.3.2 BladeCenter S and JS12 Express The IBM i operating system is now supported on a JS12 Express in a BladeCenter S. The system is limited to 12 SAS DASD and the following charts try to characterize the performance we achieved during experiments with the Commercial Performance Workload in the IBM lab. Using a JS22 Express in a BladeCenter H connected to a DS4800, we limited the resources in order to get a comparison to the SAS DASD used in the BladeCenter S.
PAGE 228
14.5.1.3.3 JS12 Express and JS22 Express Configuration Considerations 1. The aggregate total of virtual disks (LUNs) will be sixteen at most. Many customers will want to deploy between 12 and 16 LUNs and maximize symmetry. Consult carefully with your support team on the choices here. This is the most important consideration as it is difficult to change later. Consult also any available Best Practices manuals for a given SAN attached storage server. 2.
PAGE 229
14.5.1.3.4 DS3000/DS4000 Storage Subsystem Performance Tips Physical disks can be configured various ways with RAID levels, number of disks in each array and number of LUNs created over those arrays. There are also various reasons for the configurations that are chosen. One end user might be looking for ease of use and choose to create one array with multiple LUNs, where another end user might consider performance to be a more critical issue and select to create multiple arrays.
PAGE 230
Blade Center H with a JS22 4 Way Commercial Performance Workload System Workload Response Time (Seconds) 10 1 0.1 0.01 0.
PAGE 231
14.6 IBM i operating system 5.4 Virtual SCSI Performance The primary goal of virtualization is to lower the total cost of ownership of equipment by improving utilization of the overall system resources and reducing the labor requirements to operate and manage many servers. With virtualization, the IBM Power Systems can now be used similar to the way mainframes have been used for decades, sharing the hardware between many programs, services, applications, or users.
PAGE 232
In the test results that follow, we see the CPU required for IBM i operating system Virtual SCSI server and the benefits of the IBM i operating system Virtual SCSI implementation should be assessed for a given environment. Simultaneous multithreading should be enabled in a virtual hosted disk environment. For most efficient virtual hosted disk implementation with larger IO loads, it may be advantageous to keep the IBM i operating system Virtual SCSI Server partition as a dedicated processor.
PAGE 233
14.6.1 Introduction In general, applications are functionally isolated from the exact nature of their storage subsystems by the operating system. An application does not have to be aware of whether its storage is contained on one type of disk or another when performing I/O. But different I/O subsystems have subtly different performance qualities, and virtual SCSI is no exception.
PAGE 234
All measurements were completed on a POWER5 570+ 4-Way (2.2 GHz). Each system is configured as an LPAR, and each virtual SCSI test was performed between two partitions on the same system with one CPU for each partition. IBM i operating system 5.4 was used on the virtual SCSI server and AIX 5.3 was used on the client partitions.
PAGE 235
14.6.2.1 Native vs. Virtual Performance Figure 1 shows a comparison of measured bandwidth using virtual SCSI and local attached DASD for reads with varying block sizes of operations. The difference in the reads between virtual I/O and native I/O in these tests is attributable to the increased latency using virtual I/O. The difference in writes is caused by misalignment, which causes a read for every write.
PAGE 236
14.6.2.3 Virtual SCSI Bandwidth-Network Storage Description (NWSD) Scaling Figure 3 shows a comparison of measured bandwidth while scaling network storage descriptions with varying block sizes of operations. Each of the network storage descriptions have a single network storage space attached to them. The difference in the scaling of these tests is attributable to the performance gain which can be achieved by adding multiple network storage descriptions.
PAGE 237
14.6.2.4 Virtual SCSI Bandwidth-Disk Scaling Figure 4 shows a comparison of measured bandwidth while scaling disk drives with varying block sizes of operations. Each of the network storage descriptions have a single network storage space attached to them. The difference in the scaling of these tests is attributable to the performance gain which can be achieved by adding disk drives and IO adapters. The figures below include small (4k-64k) transactions and larger (128k) transactions.
PAGE 238
14.6.3 Sizing Sizing methodology is based on the observation that processor time required to perform an I/O on the IBM i operating system Virtual SCSI server is fairly constant for a given I/O size. The I/O devices supported by the Virtual SCSI server are sufficiently similar to provide good recommendations. These numbers are measured at the physical processor. There are considerations to address when designing and implementing a Virtual SCSI environment.
PAGE 239
To calculate IBM i operating system Virtual SCSI CPU requirements the following formula is provided. The number of transactions per second could be collected by the IBM i operating system command WRKDSKSTS. Based on the average transaction size in WRKDSKSTS, select a number from the table.
PAGE 240
14.6.3.2 Sizing when using Micro-Partitioning Defining Virtual SCSI servers in micro-partitions enables much better granularity of processor resource sizing and potential recovery of unused processor time by uncapped partitions. Tempering those benefits, use of micro-partitions for Virtual SCSI servers slightly increases I/O response time and creates somewhat more complex processor entitlement sizing.
PAGE 241
14.6.3.3 Sizing memory The IBM i operating system Virtual SCSI server supports data read caching on the virtual hosted disk server partition. Thus all I/Os that it services could benefit from effects of caching heavily used data. Read performance can vary depending upon the amount of memory which is assigned to the server partition. Workloads which have a small memory footprint can improve their performance greatly by increasing the amount of memory in the IBM i operating system Virtual SCSI server.
PAGE 242
14.6.4 AIX Virtual IO Client Performance Guide The following is a link which will direct you to more in-depth performance tuning for AIX virtual SCSI client. Advanced POWER Virtualization on IBM p5 Servers: Architecture and Performance Considerations http://www.redbooks.ibm.com/abstracts/sg247940.html? 14.6.5 Performance Observations and Tips • • • • • • • • • In order to achieve best performance 1 network storage description should be used for every 2-4 disks within an ASP.
PAGE 243
Chapter 15. Save/Restore Performance This chapter’s focus is on the IBM i operating system platform. For legacy system models, older device attachment cards, and the lower performing backup devices see the V5R3 performance capabilities reference. Many factors influence the observable performance of save and restore operations. These factors include: y The backup device models, number of DASD units the data is spread across, processors, LPAR configurations, IOA used to attach the devices.
PAGE 244
15.2 Save Command Parameters that Affect Performance Use Optimum Block Size (USEOPTBLK) The USEOPTBLK parameter is used to send a larger block of data to backup devices that can take advantage of the larger block size. Every block of data that is sent has a certain amount of overhead that goes with it. This overhead includes block transfer time, IOA overhead, and backup device overhead. The block size does not change the IOA overhead and backup device overhead, but the number of blocks does.
PAGE 245
15.3 Workloads The following workloads were designed to help evaluate the performance of single, concurrent and parallel save and restore operations for selected devices. Familiarization with these workloads can help in understanding differences in the save and restore rates. Database File related Workloads: The following workloads are designed to show some possible customer environments using database files.
PAGE 246
15.4 Comparing Performance Data When comparing the performance data in this document with the actual performance on your system, remember that the performance of save and restore operations is data dependent. If the same backup device was used on data from three different systems, three different rates may result.
PAGE 247
15.5 Lower Performing Backup Devices With the lower performing backup devices, the devices themselves become the gating factor so the save rates are approximately the same, regardless of system CPU size (DVD-RAM). Table 15.5.
PAGE 248
15.8 The Use of Multiple Backup Devices Concurrent Saves and Restores - The ability to save or restore different objects from a single library/directory to multiple backup devices or different libraries/directories to multiple backup devices at the same time from different jobs. The workloads that were used for the testing were Large Database File and User Mix from libraries. For the tests multiple identical libraries were created, a library for each backup device being used.
PAGE 249
15.9 Parallel and Concurrent Library Measurements This section discusses parallel and concurrent library measurements for tape drives, while sections later in this chapter discuss measurements for virtual tape drives. 15.9.1 Hardware (2757 IOAs, 2844 IOPs, 15K RPM DASD) Hardware Environment. This testing consisted of an 840 24 way system with 128 GB of memory. The model 840 doesn’t support the 15K RPM DASD in the main tower so only 4, 18 GB 10K RPM RAID protected DASD units were in the main tower.
PAGE 250
15.9.2 Large File Concurrent For the concurrent testing 16 libraries were built, each containing a single 320 GB file with 80 4 GB members. The file size was chosen to sustain a flow across the HSL, system bus, processors, memory and tapes drives for about an hour. We were not interested in peak performance here but sustained performance.
PAGE 251
15.9.3 Large File Parallel For the measurements in this environment, BRMS was used to manage the save and restore, taking advantage of the ability built into BRMS to split an object between multiple tape drives. Starting with a 320 GB file in a single library and building it up to 2.1 TB for tape drive tests 1 - 4 and 8. The file was then duplicated in the library for tape drive tests 12 - 16, a single library with two 2.1 TB files was used. Not quite the same as having a 4.2 TB file.
PAGE 252
15.9.4 User Mix Concurrent User Mix will generally portray a fair population of customer systems, where the real data is a mixture of programs, menus, commands along with their database files. The new ultra tape drives are in their glory when streaming large file data, but a lot of other factors play a part when saving and restoring multiple smaller objects. Table 15.9.4.1 iV5R2 16 - 3580.002 Fiber Channel Tape Device Measurements (Concurrent) (Save = S, & Restore = R) # 3580.
PAGE 253
15.10 Number of Processors Affect Performance With the Large Database File workload, it is possible to fully feed two backup devices with a single processor, but with the User Mix workload it takes 1+ processors to fully feed a backup device. A recommendation might be 1 and 1/3 processors for each backup device you want to feed with User Mix data. .
PAGE 254
15.11 DASD and Backup Devices Sharing a Tower The system architecture does not require that DASD and backup devices be kept separated. Testing in the IBM Rochester Lab, we had attached one backup device to each tower and all towers had 45 DASD units in them, when we did the 3580 002 testing. The 3592J has similar characteristics to the 3580 002 but the 3580 003 and 3592E models have greater capacities which create new scenarios.
PAGE 255
15.12 Virtual Tape Virtual tape drives are being introduced in iV5R4 so those customers can make use of the speed of saving to DASD, then save the data using DUPTAP to the tape drives reducing the backup window where the system is unavailable to users. There are a lot of pieces to consider in setting up and using Virtual tape drives. The block size must match the physical backup device block capabilities you will be using.
PAGE 256
The following measurements were done on a system with newer hardware including a 3580 Ultrium 3 4Gb Fiber Channel Tape Drive, 571E storage adapters, and 4327 70GB (U320) DASD. Save to Tape Vs. Save to Virtual Tape then DUPTAP to Tape 570, 8 Way, 96GB Memory, 305 DASD units for Virtual Tape Drives Restricted State Save to Tape Restricted State Save to Virtual Tape Non Restricted DUPTAP 4 Hours to Save 1 TB of Data 3.5 3 2.5 2 1.
PAGE 257
15.13 Parallel Virtual Tapes NOTE: Virtual tape is reading and writing to the same DASD so the maximum throughput with our concurrent and parallel measurements is different than our tape drive tests where we were reading from DASD and writing to tape.
PAGE 258
15.14 Concurrent Virtual Tapes NOTE: Virtual tape is reading and writing to the same DASD so the maximum throughput with our concurrent and parallel measurements is different than our tape drive tests where we were reading from DASD and writing to tape.
PAGE 259
15.15 Save and Restore Scaling using a Virtual Tape Drive. A 570 8 way System i was used for the following tests. A user ASP was created using up to 3 571F IOAs with up to 36 U320 70 GB DASD on each IOA. The Chart shows the number of DASD in each test and the Virtual tape drive was created using that DASD. The workload data was restored into the system ASP and was then saved to the Virtual tape drive in the user ASP.
PAGE 260
15.16 Save and Restore Scaling using 571E IOAs and U320 15K DASD units to a 3580 Ultrium 3 Tape Drive. A 570 8 way System i was used for the following tests. A user ASP was created with the number of DASD listed in each test . The workload data was then saved to the tape drive , deleted from the system and restored to the user ASP. These charts are very specific to the new IOAs and U320 capable DASD available. For more information on the IOAs and DASD see Chapter 14 of this guide.
PAGE 261
User Mix Saves RAID6 SAVE MIRRORING SAVE 350 300 250 200 D D 18 ASD D 24 ASD D 30 ASD D 36 ASD D 42 ASD D 48 ASD D 54 ASD D 60 ASD D 66 ASD D 72 ASD D 78 ASD D 84 ASD D 90 ASD D AS D 150 100 50 0 12 6 DA S GB/HR RAID5 SAVE User Mix Restores RAID6 RESTORE MIRRORING RESTORE 12 6 D D A 18 SD D 24 ASD D 30 ASD D 36 ASD D 42 ASD D 48 ASD D 54 ASD D 60 ASD D 66 ASD D 72 ASD D 78 ASD D 84 ASD D 90 ASD D AS D 180 160 140 120 100 80 60 40 20 0 DA S GB/HR RAID5 RESTORE IBM i 6.
PAGE 262
15.17 High-End Tape Placement on System i The current high-end tape drives (ULTRIUM-2 / ULTRIUM-3 and 3592-J / 3592-E) need to be placed carefully on the System i buses and HSLs in order to avoid bottlenecking.
PAGE 263
15.18 BRMS-Based Save/Restore Software Encryption and DASD-Based ASP Encryption The Ultrium-3 was used in the following experiments, which attempt to characterize the effects of BRMS-based save /restore software encryption and DASD-based ASP encryption. Some of the newer tape drives offer hardware encryption as an option but for those who are not looking to upgrade or invest in these tape units at this time, software encryption can be a fair solution.
PAGE 264
Tape Backup Performance - Restores GB/HR 9406-MMA-4w ay Encrypted ASP RSTLIBBRM NO Softw are Encryption 9406-MMA-4w ay Encrypted ASP RSTLIBBRM With Softw are Encryption 9406-MMA-4w ay NON Encrypted ASP RSTLIBBRM NO Softw are Encryption 9406-MMA-4w ay NON Encrypted ASP RSTLIBBRM With Softw are Encryption 9406-570-4w ay NON Encrypted ASP RSTLIBBRM NO Softw are Encryption 9406-570-4w ay NON Encrypted ASP RSTLIBBRM With Softw are Encryption 300 250 200 150 100 50 0 1 GB Source File 12 GB User Mix 64 GB Lar
PAGE 265
15.19 5XX Tape Device Rates Note: Measurements for the high speed devices were completed on a 570 4 way system with 2844 IOPs and 2780 IOA’s and 180 15K RPM RAID5 DASD units. The smaller tape device tests were completed on a 520 2 way with 75 DASD units. The Virtual tape and *SAVF runs were completed on a 570 ML16 with 256GB of memory and 924 DASD units.
PAGE 266
Table 15.19.2 - iV5R4M0 Measurements on an 5XX 1-way system 8 RAID5 protected DASD Units 8 GB memory Measurements in (GB/HR) all 8 DASD in the system ASP . 6258 4MM tape Drive SLR60 from table 15.18.
PAGE 267
15.20 5XX Tape Device Rates with 571E & 571F Storage IOAs and 4327 (U320) Disk Units Save/restore rates of 3580 Ultrium 3 (2Gb and 4Gb Fiber Channel) tape devices and of virtual tape devices were measured on a 570 8-way system with 571E and 571F storage adapters and 714 type 4327 70GB (U320) disk units. Customer performance will be dependent on overall system resources and how well those resources match the maximum capabilities of the device. See other sections in this guide about memory, CPU and DASD.
PAGE 268
15.21 5XX DVD RAM and Optical Library Table 15.21.1 - iV5R3 Measurements on an 520 2-way system 53 RAID protected DASD Units 16 GB memory Measurements in (GB/HR) ASP 1 (System ASP 23 DASD) ASP 2 (30 DASD) Workload data Saved and Restored from User ASP 2. 6331 DTACPR *NO 6331 DTACPR *YES 6333 DTACPR *NO 6333 DTACPR *YES 6330 DTACPR *NO 6330 DTACPR *YES 399F Model 200 Optical Library UDO 399F Model 200 Optical Library 14x V5R3 V5R3 V5R3 V5R3 V5R3 V5R3 V5R3 V5R3 S 1.8 9.0 2.2 12.0 3.
PAGE 269
15.22 Software Compression The rates a customer will achieve will depend upon the system resources available. This test was run in a very favorable environment to try to achieve the maximum rates. Software compression rates were gathered using the QSRSAVO API. The CPU used in all compression schemes was near 100%. The compression algorithm cannot span CPUs so the fact that measurements were performed on a 24-way system doesn’t affect the software compression scenario. Table 15.22.
PAGE 270
15.23 9406-MMA DVD RAM Table 15.23.1 - iV5R4M5 Measurements on an 9406-MMA 4-way system 6 Mirrored DASD in the CEC and 24 RAID5 protected DASD Units attached 32 GB memory Measurements in (GB/HR) all 30 DASD in the system ASP. SAS 6331 DTACPR *NO 5X Media SAS 6331 DTACPR *YES 5X Media iV5R4M5 iV5R4M5 S 3.0 13.4 R 7.3 9.3 S 2.3 8.0 R 12.5 28.0 Workload S = Save R = Restore Release Measurements were done Source File 1GB User Mix 3GB S 2.2 8.0 R 14.0 45.0 1 Directory Many Objects S 2.
PAGE 271
15.24 9406-MMA 576B IOPLess IOA Table 15.24.1 - iV6R1M0 Measurements on an 9406-MMA 4-way system 200 RAID5 protected DASD Units in the system ASP, attached via 571F IOAs 40 GB memory Measurements in (GB/HR).
PAGE 272
15.25 What’s New and Tips on Performance What’s New iV6R1M0 March 2008 BRMS-Based Save/Restore Software Encryption and DASD-Based ASP Encryption 576B IOPLess Storage IOA iV5R4M5 July 2007 3580 Ultrium 4 - 4Gb Fiber Channel Tape Drive 6331 SAS DVD RAM for 9406-MMA system models iV5R4 January 2007 571E and 571F storage IOAs (see DASD Performance chapter for more information) August 2006 1. DUPTAP performance PTFs (iV5R4 - SI23903, MF39598, MF39600, and MF39601) 2.
PAGE 273
Chapter 16 IPL Performance Performance information for Initial Program Load (IPL) is included in this section. The primary focus of this section is to present observations from IPL tests on different System i models. The data for both normal and abnormal IPLs are broken down into phases, making it easier to see the detail. For information on previous models see a prior Performance Capabilities Reference.
PAGE 274
16.3 9406-MMA System Hardware Information 16.3.
PAGE 275
16.4 9406-MMA IPL Performance Measurements (Normal) The following tables provide a comparison summary of the measured performance data for a normal and abnormal IPL. Results provided do not represent any particular customer environment. Measurement units are in minutes and seconds Table 16.4.
PAGE 276
16.6 NOTES on MSD MSD is Mainstore Dump. General IPL phase as it relates to the SRCs posted on the operation panel: Processor MSD includes the D2xx xxxx and C2xx xxxx right after the system is forced to terminate. SLIC MSD IPL with Copy follows with the next series of C6xx xxxx, see the next heading for more information on the SLIC MSD IPL with Copy. The copy occurs during the C6xx 4404 SRCs. Shutdown includes the Dxxx xxxx SRCs. Hardware re-ipl includes the next phase of D2xx xxxx and C2xx xxxx.
PAGE 277
16.7 5XX System Hardware Information 16.7.1 5XX Small system Hardware Configuration 520 7457 2 way - 16 GB Mainstore DASD / 23 35GB 15K rpm arms, RAID Protected Software Configuration 100,000 spool files (100,000 completed jobs with 1 spool file per job) 500 jobs in job queues (inactive) 500 active jobs in system during Mainstore dump 1000 user profiles 1000 libraries Database: y 2 libraries with 500 physical files and 20 logical files 16.7.
PAGE 278
16.8 5XX IPL Performance Measurements (Normal) The following tables provide a comparison summary of the measured performance data for a normal and abnormal IPL. Results provided do not represent any particular customer environment. Measurement units are in minutes and seconds Table 16.8.
PAGE 279
16.10 5XX IOP vs IOPLess effects on IPL Performance (Normal) Measurement units are in minutes and seconds. Table 16.10.2 Normal IPL - Power-On (Cold Start) iV5R4 GA7 Firmware 16 Way IOP 570 7476 256 GB 924 DASD 17:44 Hardware 6:43 SLIC 2:32 OS/400 26:59 Total iV5R4 GA7 Firmware 16 Way IOPLess 570 7476 256 GB 924 DASD 18:06 7:20 2:52 28:18 16.
PAGE 280
Chapter 17. Integrated BladeCenter and System x Performance 4 This chapter provides a performance overview and recommendations for the Integrated xSeries Server , the Integrated xSeries Adapter and the iSCSI host bus adapter. In addition, the chapter presents some performance characteristics and impacts of these solutions on System i™. 17.
PAGE 281
Integrated xSeries Servers (IXS) An Integrated xSeries Server is an Intel processor-based server on a PCI-based interface card that plugs into a host system. This card provides the processor, memory, USB interfaces, and in some cases, a built-in gigabit Ethernet adapter. There are several hardware versions of the IXS: y The 2.0 GHz Pentium® M IXS (hardware type #4812-001)6. y The 2.0 GHz PCI IXS (hardware type #2892-002). Older versions of the IXS Card are: the 1.
PAGE 282
y Write Cache Property When the disk device write cache property is disabled, disk operations have similar performance characteristics to shared disks. You may examine or change the “Write Cache” property on Windows by selecting disk “properties” and then the “Hardware tab”. Then view “Properties” for a selected disk and view the “Disk Properties” or “Device Options” tab. All dynamically and statically linked storage spaces have “Write Cache” enabled by default.
PAGE 283
y With iSCSI, there are some Windows side disk configuration rules you must take into account to enable efficient disk operations. Windows disks should be configured as: 1 disk partition per virtual drive. File system formatted with cluster sizes of 4 kbyte or 4 kbyte multiples. 2 gigabyte or larger storage spaces (for which Windows creates a default NTFS cluster size of 4kbytes). If necessary, you can use care to configure multiple disk partitions on a single virtual drive.
PAGE 284
2. Vary on any Network Server Description (NWSD) with a Network server connection type of *ISCSI. During the iSCSI network server vary on processing the QFPHIS subsystem is automatically started if necessary. The subsystem will activate the private memory pool. iSCSI network server descriptions that are varied on will then utilize the first private memory pool configured with at least the minimum (4MB) size for virtual disk I/O operations.
PAGE 285
IXS and IXA I/O operations (disk, tape, optical and virtual Ethernet) communications occur through the individual IXS and IXA IOP resource. This IOP imposes a finite capacity. The IOP processor utilization may be examined via the iSeries Collection Services utilities. The performance results presented in the rest of this chapter are based on measurements and projections using standard IBM benchmarks in a controlled environment.
PAGE 286
For Each Target HBA For Each NWSD Machine Pool: 21 MBytes 1 MByte Base Pool: 1 MByte 0.5 MByte QFPHIS Private Pool: 0.5 MByte 1 MByte12 Total: 22.5 MBytes 2.5 MBytes Warning: To ensure expected performance and continuing machine operation, it is critical to allocate sufficient memory to support all of the devices that are varied on. Inadequate memory pools can cause unexpected machine operation. 17.
PAGE 287
CPW per 1k Disk Operations 6 00 CPW / 1k Ops/sec 5 00 4 00 3 00 2 00 1 00 iSCSI IXS/IXA w Caching Disabled or Shared AppServ FileServ 512 RW 1k RW 2k RW 4k RW 8k RW 16k RW 24k RW 32k RW 64k RW 512 Read 1k Read 2k Read 4k Read 8k Read 16k Read 24k Read 32k Read 64k Read 512 Write 1k Write 2k Write 4k Write 8k Write 16k Write 24k Write 32k Write 64k Write 0 IXS/IXA w Caching Enabled The charts shows the relative cost when performing 5 different types of operations 14.
PAGE 288
y A storage space which is linked as shared, or a disk with caching disabled, requires more CPU to process write operations (approx. 45%). y Sequential operations cost approximately 10% less than the random I/O results shown above. y Even though a Windows disk driver may have write cache enabled, some Windows applications may request to bypass the cache for some operations (extended writes), and these operations would incur the higher CPW cost.
PAGE 289
The blue square line shows an iSCSI connection with a single target iSCSI HBA - single initiator iSCSI HBA connection, configured to run with standard frames. The pink circle line is a single target iSCSI HBA to multiple servers and initiators running also running with standard frames. With the initiators and switches configured to use 9k jumbo frames, a 15% to 20% increase in upper capacity is demonstrated. 15 17.
PAGE 290
than an IXS or IXA attached VE connection. “Stream” means that the data is pushed in one direction, with only the TCP acknowledge packets running in the other direction.
PAGE 291
The chart above shows the CPW efficiency of operations (larger is better). Note the CPW per Mbits/sec scale on the left - as it’s different for each chart. For an IXS or IXA, the port-based VE has the least CPW or smaller packets due to consolidation of transfers available in Licensed Internal Code. The VLAN-based transfers have the greatest cost (However the total would be split during inter-LPAR communications). For iSCSI, the cost of using standard frames is 1.5 to 4.5 times higher than jumbo frames. 17.
PAGE 292
The legend label “Mixed Files” indicates a save of many files of mixed sizes - equivalent to the save of the Windows system file disk. “Large files” indicates a save of many large files - in this case many 100MB files. FLBU SAV / RST Rates 90.00 80.00 70.00 60.00 50.00 GB per Hr 40.00 30.00 20.00 10.00 0.00 SAV to disk SAV to Tape iSCSI Large RST from disk RST from Tape iSCSI Mixed IXA Large IXA Mixed 17.
PAGE 293
Choose V5R4. In the “Contents” panel choose “iSeries Information Center”. Expand “Integrated operating environments” and then “Windows environment on iSeries” for Windows environment information or “Linux” and then “Linux on an integrated xSeries solution for Linux Information on an IXS or attached xSeries server. Microsoft Hardware Compatibility Test URL: See http://www.microsoft.com/whdc/hcl/search.mspx search on IBM for product types Storage/SCSI Controller and System/Server Uniprocessor. IBM i 6.
PAGE 294
Chapter 18. Logical Partitioning (LPAR) 18.1 Introduction Logical partitioning (LPAR) is a mode of machine operation where multiple copies of operating systems run on a single physical machine. A logical partition is a collection of machine resources that are capable of running an operating system. The resources include processors (and associated caches), main storage, and I/O devices. Partitions operate independently and are logically isolated from other partitions.
PAGE 295
y y Allocate fractional CPUs wisely. If your sizing indicates two partitions need 0.7 and 0.4 CPUs, see if there will be enough remaining capacity in one of the partitions with 0.6 and 0.4 or else 0.7 and 0.3 CPUs allocated. By adding fractional CPUs up to a "whole" processor, fewer physical processors will be used. Design implies that some performance will be gained. Avoid shared processors on large partitions if possible.
PAGE 296
The reasons for the LPAR overhead can be attributed to contention for the shared memory bus on a partitioned system, to the aggregate bandwidth of the standalone systems being greater than the bandwidth of the partitioned system, and to a lower number of system resources configured for a system partition than on a standalone system. For example on a standalone 2-way system the main memory available may be X, and on a partitioned system the amount of main storage available for the 2-way partition is X-2.
PAGE 297
Also note that part of the performance increase of an larger system may have come about because of a reduction in contention within the CPW workload itself. That is, the measurement of the standalone 12-way system required a larger number of users to drive the system’s CPU to 70 percent than what is required on a 4-way system. The larger number of users may have increased the CPW workload’s internal contention.
PAGE 298
LPAR Throughput Increase Total Increase in CPW Capacity of an LPAR System 5400 13% Total CPW of all Partitions 5300 5200 9% 5100 7% 5000 4900 4800 4700 4600 12-way 8-way+4-way 2 x 6-way 3 x 4-way LPAR Configuration Figure 18.2. 12 way LPAR Throughput Example To illustrate the impact that varying the workload in the partitions has on an LPAR system, the CPW workload was run at an extremely high utilization in the stand-alone 12-way.
PAGE 299
18.4 LPAR Measurements The following chart shows measurements taken on a partitioned 12-way system with the system’s CPU utilized at 70 percent capacity. The system was at the V4R4M0 release level. Note that the standalone 12-way CPW value of 4700 in our measurement is higher than the published V4R3M0 CPW value of 4550. This is because there was a contention point that existed in the CPW workload when the workload was run on large systems.
PAGE 300
The following chart shows projected LPAR capacities for several LPAR configurations. The projections are based on measurements on 1 and 2 way measurements when the system’s CPU was utilized at 70 percent capacity. The LPAR overhead was also factored into the projections. The system was at the V4R4M0 release level. Table 18.4 Projected LPAR Capacities LPAR Configuration Projected LPAR CPW Number Processors 12 1-ways 5920 6 2-ways 5700 Projected CPW Increase Over a Standalone 12-way 26 % 21 % 18.
PAGE 301
Chapter 19. Miscellaneous Performance Information 19.1 Public Benchmarks (TPC-C, SAP, NotesBench, SPECjbb2000, VolanoMark) iSeries systems have been represented in several public performance benchmarks. The purpose of these benchmarks is to give an indication of relative strength in a general field of computing. Benchmark results can give confidence in a system's capabilities, but should not be viewed as a sole criterion for the purchase or upgrading of a system.
PAGE 302
The most commonly run of these is the SAP-SD (Sales and Distribution) benchmark. It can be run in a 2-tier environment, where the application and database reside on the same system, or on a 3-tier environment, where there are many application servers feeding into a database server. Care must be taken to ensure that the same level of software is being run when comparing results of SAP benchmarks. Like most software suppliers, SAP strives to enhance their product with useful functions in each release.
PAGE 303
This web site is primarily focused on results for systems that the Volano company measures themselves. These results tend to be for much smaller, Intel-based systems that are not comparable with iSeries servers. The web site also references articles written by other groups regarding their measurements of the benchmark, including AS/400 and iSeries articles. iSeries servers have demonstrated significant strengths in this benchmark, particularly in scaling to large systems. 19.
PAGE 304
of relatively lower delay cost. y Waiting Time The waiting time is used to determine the delay cost of a job at a particular time. The waiting time of a job which affects the cost is the time the job has been waiting on the TDQ for execution. y Delay Cost Curves The end-user interface for setting job priorities has not changed. However, internally the priority of a job is mapped to a set of delay cost curves (see "Priority Mapping to Delay Cost Curves" below).
PAGE 305
y y y Priority 47-51 Priority 52-89 Priority 90-99 Jobs in the same group will have the same resource (CPU seconds and Disk I/O requests) usage limits. Internally, each group will be associated with one set of delay cost curves. This would give some preferential treatment to jobs of higher user priorities at low system utilization.
PAGE 306
less CPU utilization resulting in slightly lower transaction rates and slightly longer response times. However, the batch job gets more CPU utilization and consequently shorter run time. y It is recommended that you run with Dynamic Priority Scheduling for optimum distribution of resources and overall system performance. For additional information, refer to the Work Management Guide. 19.
PAGE 307
of printers in the configuration. 70% of the remaining memory is allocated to the interactive pool; 30% to the base pool. A QPFRADJ value of 1 ensures that memory is allocated on the system in a way that the system will perform adequately at IPL time. It does not allow for reaction to changes in workload over time. In general, this value is avoided unless a routine will be run shortly after an IPL that will make adjustments to the memory pools based on the workload.
PAGE 308
files of differing characteristics are being accessed. The pool attribute can be changed from *FIXED to *CALC and back at any time, so making a change and evaluating its affect over a period of time is a fairly safe experiment. More information about Expert Cache can be found in the Work Management guide. In some situations, you may find that you can achieve better memory utilization by defining the caching characteristics yourself, rather than relying on the system algorithms.
PAGE 309
To determine a reasonable level of page faulting in user pools, determine how much the paging is affecting the interactive response time or batch throughput. These calculations will show the percentage of time spent doing page faults. The following steps can be used: (all data can be gathered w/STRPFRMON and printed w/PRTSYSRPT). The following assumes interactive jobs are running in their own pool, and batch jobs are running in their own pool. Interactive: 1.
PAGE 310
NOTE: It is very difficult to predict the improvement of adding storage to a pool, even if the potential gain calculated above is high. There may be instances where adding storage may not improve anything because of the application design. For these circumstances, changes to the application design may be necessary. Also, these calculations are of limited value for pools that have expert cache turned on. Expert cache can reduce I/Os given more main storage, but those I/Os may or may not be page faults. 19.
PAGE 311
AS/400 NetFinity Software Inventory Performance Total Collection Time (min) 240 220 AS/400 510-2142 Token Rings TPC/IP V4R1 200 180 160 140 120 100 About 100 clients were collected in 42 minutes 80 60 40 20 0 0 100 200 300 400 500 600 Number of PC Clients Figure 19.1.
PAGE 312
Conclusions/Recommendations for NetFinity 1. The time to collect hardware or software information for a number of clients is fairly linear. 2. The size of the AS/400 CPU is not a limitation. Data collection is performed at a batch priority. CPU utilization can spike quite high (ex. 80%) when data is arriving, but in general is quite low (ex. 10%). 3. The LAN type (4 or 16Mb Token Ring or Ethernet) is not a limitation.
PAGE 313
Chapter 20. General Performance Tips and Techniques This section's intent is to cover a variety of useful topics that "don't fit" in the document as a whole, but provide useful things that customers might do or deal with special problems customers might run into on iSeries. It may also contain some general guidelines. 20.1 Adjusting Your Performance Tuning for Threads History Historically, the iSeries and AS/400 programmers have not had to worry very much about threads.
PAGE 314
Problem It is too easy to use the overall pool's value of MAXACT as a surrogate for controlling the number of Jobs. That is, you can forget the distinction between jobs and threads and use MAXACT to control the activity in a storage pool. But, you are not controlling jobs; you are controlling threads. It is also too easy to have your existing MAXACT set too low if your existing QBATCH subsystem suddenly sees lots of new Java threads from new Java applications.
PAGE 315
20.2 General Performance Guidelines -- Effects of Compilation In general, the higher the optimization, the less easy the code will be to debug. It may also be the case that the program will do things that are initially confusing. In-lining For instance, suppose that ILE Module A calls ILE Module B. ILE Module B is a C program that does allocation (malloc/free in C terms). However, in the right circumstances, compiler optimization will "inline" Module B.
PAGE 316
20.3 How to Design for Minimum Main Storage Use (especially with Java, C, C++) The iSeries family has added popular languages whose usage continues to increase -- Java, C, C++. These languages frequently use a different kind of storage -- heap storage. Many iSeries programmers, with a background in RPG or COBOL are unaware of the influence this may have on storage consumption. Why? Simply because these languages, by their nature, do not make much if any use of the heap.
PAGE 317
Where a and b are constants. “a” is determined by adding up things like the static storage taken up by the application program. “b” is the size of the data base record plus the size of anything else, such as a Java object, that is created one entity per data base record. In some applications, “N” will refer to some freestanding fact, like the maximum number of concurrent web serving operations or the number of outstanding new orders being processed.
PAGE 318
Order(1) Order(j) Order(t) Order(N) ILE and OS/400 Programs Subsystem Descriptions Just In Time compiled programs (Java *JIT) Total Job Storage Java threads Direct Execution Java Programs Static storage from RPG and COBOL. Static final in Java. SQL Result Set (nonrecord) System values Java Virtual Machine and most WebSphere storage Program stack storage Data Base Records and IFS file records Java (and C/C++) objects Operating System copies (e.g.
PAGE 319
How practical this change would be, if it represented a large, existing data base, would be a separate question. If this is at the initial design, however, this is an easy change to make. Boundary considerations. In Java, we are done because Java will order the three entities such that the least amount of space is wasted. In C and C++, it might be possible to lay out the storage entities such that the compiler will not introduce padding between elements.
PAGE 320
One thing easily misunderstood is variable length characters. At first, one would think every character field should be variable length, especially if one codes in Java, where variable length data is the norm. However, when one considers the internals of data base, a field ought to be ten to twenty bytes long before variable length is even considered. The reason is, there is a cost of about ten bytes per record for the first variable length field.
PAGE 321
20.4 Hardware Multi-threading (HMT) Hardware multi-threading is a facility present in several iSeries processors. The eServer i5 models instead have the Simultaneous Multi-threading (SMT) facility, which are discussed in the SMT white paper at the following website: http://www-1.ibm.com/servers/eserver/iseries/perfmgmt/pdf/SMT.pdf. HMT is mentioned here primarily to compare-and-contrast with the SMT.
PAGE 322
HMT and SMT Compared and Contrasted Some key similarities and differences are: HMT Feature yHMT is can be turned on and off only by a whole system IPL. yAll partitions have the same value for HMT yHMT executes only one instruction stream at a time. yCPU utilization measurements are not greatly affected by HMT. ySystem performance counters and CPU utilization values continue to be reported on a physical CPU basis.
PAGE 323
20.5 POWER6 520 Memory Considerations Because of the design of the Power6 520 system, there are some key factors with the memory subsystem that one should keep in mind when sizing this system. The Power6 520, unlike the Power6 570, has no L3 cache, which does have an effect on memory sensitive workloads, like Java applications for instance. Having no L3 cache makes memory speed, or the bandwidth rating in megabytes per second, even more critical for memory sensitive workloads.
PAGE 324
activation time. This means that a partition that requires 4 GB of memory could be assigned 2 GB from the quad with 4 GB DIMMs and the other 2 GB from the quad with 8 GB DIMMs. This too can cause an application to have different performance characteristics on partitions configured with exactly the same amount of resources. When system planning for the Power6 520, there are a number of memory related factors that should be considered, each of which can affect performance of memory sensitive workloads.
PAGE 325
floating-point data may be copied using the floating-point loads and store, resulting in an alignment interrupt. As an example, consider the following structures, one specifying "packed" and the other allowed to be aligned per the compiler. For example: struct FPAlignmentStruct Packed { long FloatingPointOp1; char ACharacter; long FloatingPointOp2; // Byte aligned; Can result in alignment interrupt.
PAGE 326
Chapter 21. High Availability Performance The primary focus of this chapter is to present data that compares the effects of high availability scenarios using different hardware configurations. The data for the high availability test are broken down into two different categories which include Switchable IASP’s, and Geographic Mirroring. High Availability Switchable Resources Considerations Switchable IASPs are the physical resource that can be switched between systems in a cluster.
PAGE 327
· · Inactive switchover - The switching time is measured from the point at which the CHGCRGPRI command is issued from the primary system which has no work until the IASP is available on the new primary system. Partition - An active partition is created by starting the database workload on the IASP. Once the workload is stabilized an option 22(force MSD) is issued on the panel. Switching time is measured from the time the MSD is forced on the primary side until new primary node varies on the IASP.
PAGE 328
Switchover Measurements NOTE: The information that follows is based on performance measurements and analysis done in the Server Group Division laboratory. Actual performance may vary significantly from these tests.
PAGE 329
Active State: In geographic mirroring, pertaining to the configuration state of a mirror copy that indicates geographic mirroring is being performed, if the IASP is online. Workload Description Synchronization: This workload is performed by starting the synchronization process on the source side from an unsynchronized geographic mirrored IASP. The workload time is measured from the time geographic mirroring is activated on the source side until the target side has completed synchronization.
PAGE 330
Workload Configuration The wide variety of hardware configurations and software environments available make it difficult to characterize a ‘typical’ high availability environment and predict the results. The following section provides a simple description of the high availability test.
PAGE 331
Geographic Mirroring Measurements NOTE: The information that follows is based on performance measurements and analysis done in the IBM Server Group Division laboratory. Actual performance may vary significantly from this test. Synchronization on an idle system: The following data shows the time required to synchronize 1 terabyte of data. This test case could vary greatly depending on the speed and latency of communication between the two systems.
PAGE 332
Geographic Mirroring Tips • • • • • • • For a quicker switchover time, keep the user-ID (UID) and group-ID (GID) of user profiles that own objects on the IASP the same between nodes of the cluster group. Having different UID’s lengthens vary on times. Geographic mirroring is optimized for large files. A large number of small files will produce a slower synchronization rate.
PAGE 333
Chapter 22. IBM Systems Workload Estimator 22.1 Overview The IBM Systems Workload Estimator (a.k.a., the Estimator or WLE), located at: http://www.ibm.com/systems/support/tools/estimator, is a web-based sizing tool for System i, System p, and System x. You can use this tool to size a new system, to size an upgrade to an existing system, or to size a consolidation of several systems.
PAGE 334
typical disclaimers that go with any performance estimate ("your experience might vary...") are especially true. We provide these sizing estimates as general guidelines only. 22.2 Merging PM for System i data into the Estimator The Measured Data workload of the Estimator is designed to accept data from various data sources. The most common ones are the PM for System i™ and PM for System p™. These are two tools that are tools available for the IBM System i™ and IBM System p™ respectively.
PAGE 335
account features like detailed journaling, resource locking, single-threaded applications, time-limited batch job windows, or poorly tuned environments. The Estimator is a capacity sizing tool. Even though it does not represent actual transaction response times, it does adhere to the policy of giving recommendations that abide by generally accepted utilization thresholds.
PAGE 336
Appendix A. CPW and CIW Descriptions "Due to road conditions and driving habits, your results may vary." "Every workload is different." These are two hallmark statements of measuring performance in two very different industries. They are both absolutely correct. For iSeries and AS/400 systems, IBM has provided a measure called CPW to represent the relative computing power of these systems in a commercial environment.
PAGE 337
CPW Application Description The CPW application simulates the database server of an online transaction processing (OLTP) environment. Requests for transactions are received from an outside source and are processed by application service jobs on the database server. It is based, in part, on the business model from benchmarks owned and managed by the Transaction Processing Performance Council.
PAGE 338
A.2 Compute Intensive Workload - CIW Unlike CPW values, CIW values are not derived from specific measurements of a single workload. They are modeled projections which are based upon the characteristics of internal workloads such as Domino workloads and application server environments such as can be found with SAP or JDEdwards applications.
PAGE 339
category that often fits into the CIW-like classification is overnight batch. Even though batch jobs often process a great deal of database work, there are relatively few jobs which means there is little switching of jobs from processor to processor. As a result, overnight batch data processing jobs sometimes act more like compute-intensive jobs.
PAGE 340
Appendix B. System i Sizing and Performance Data Collection Tools The following section presents some of the alternative tools available for sizing and capacity planning. (Note: There are products from vendors not included here that perform similar functions.) All of the tools discussed here support the current range of System i products, and include the capability to model logical partitions, partial processors (micropartitions) and server workload consolidation.
PAGE 341
B.1 Performance Data Collection Services Collecting performance data with Collection Services is an operating system function designed to run continuously that collects system and job level performance data at regular intervals which can be set from 15 seconds to 1 hour. It runs a number of collection routines called probes which collect data from many system resources including jobs, disk units, IOPs, buses, pools, and communication lines.
PAGE 342
predefined profile containing commonly used categories. For example, if you do not have a need to monitor the performance of SNADS transaction data on a regular basis, you can choose to turn that category off so that SNADS transaction data is not collected. Since Collection Services is intended to be run continuously and trace mode is not, trace mode was not integrated into the start options of Collection Services.
PAGE 343
http://www.ibm.com/servers/eserver/iseries/perfmgmt/batch.html Unzip this file, transfer to your System i platform as a save file and restore library QBCHMDL. Add this library to your library list and start the tool by using the STRBCHMDL command. Tips, disclaimers, and general help are available in the QBCHMDL/README file. It is recommended that you work closely with your IBM Technical Support Representative when using this tool. IBM i 6.
PAGE 344
Appendix C. CPW and MCU Relative Performance Values for System i This chapter details the relative system performance values: y Commercial Processing Workload (CPW). For a detailed description, refer to Appendix A, “CPW Benchmark Description”. CPW values are relative system performance metrics and reflect the relative system capacity for the CPW workload. CPW values can be used with caution in a capacity planning analysis (e.g., to scale CPU-constrained capacities, CPU time per transaction).
PAGE 345
C.1 V6R1 Additions (October 2008) C.1.1 CPW values for the IBM Power Systems - IBM i operating system Table C.1.1. CPW values for Power System Models Processor CPW Processor Chip Speed Feature GHz Model 570 (9117-MMA) 570 (9117-MMA) 7387 7388 4.4 5.0 L2/L3 cache (1) per chip 2 cores 4 cores 8 cores 12 cores 2x4MB / 32MB 2x4MB / 32MB 9850 11000 19400 21600 36200 40300 51500 56800 16 cores 70000 77600 *Note: 1.
PAGE 346
2. Memory speed differences account for some slight variations in performance difference between models. 3. CPW values for Power System models introduced in October 2008 were based on IBM i 6.1 plus enhancements in post-release PTFs. C.1.4 CPW values for IBM Power Systems - IBM i operating system Table C.1.4. CPW values for Power System Models Model Processor Feature Chip Speed GHz L2/L3 cache (1) per chip CPU (2) Range Processor CPW 520 (8203-E4A) 520 (8203-E4A) 5633 5634 4.2 4.
PAGE 347
Table C.3.1. CPW values for Power System Models Model Processor Feature Chip Speed MHz L2/L3 cache (1) per chip CPU (2) Range Processor CPW 520 (9407-M15) 5633 4200 2x4MB / 0MB 1 4300 520 (9408-M25) 5634 4200 2x4MB / 0MB 1-2 4300-8300 550 (9409-M50) 4966 4200 2x4MB / 32MB 1-4 4800-18000 *Note: 1. These models have a dedicated L2 cache per processor core, and share the L3 cache between two processor cores. 2. The range of the number of processor cores per system. C.3.
PAGE 348
Table C.4.1. IBM BladeCenter models Blade Model Server Feature JS22 (7998-61X) n/a JS22 (7998-61X) Edition Processor Chip Speed Feature Feature MHz n/a n/a n/a 52BE 52BE 4000 4000 L2/L3 cache (1) per chip 2x4MB / 0 MB 2x4MB / 0 MB Processor CPW CPUs 3 of 4 (2) 3.7 of 4 (3) 11040 13800 *Note: 1. These models have a dedicated L2 cache per processor core, and no L3 cache 2. CPW value is for a 3-core dedicated partition and a 1-core VIOS 3. CPW value is for a 3.
PAGE 349
Table C.6.1.1. System i models 9406-595 9406-595 9406-595 Edition Accelerator Chip Speed L2/L3 cache Feature MHz per CPU (1) Feature2 5870 NA 2300 1.9/36MB 5895(4) NA 2300 1.9/36MB 5875(4) NA 2300 1.
PAGE 350
Table C.6.1.1. System i models Model 9406-520 9406-520 Value 9406-520 9406-520 Express 9405-520 9405-520 9405-520 9405-520 9405-520 9405-520 9405-520 9405-520 Edition Accelerator Chip Speed L2/L3 cache Feature MHz per CPU (1) Feature2 (5) 7373 NA 1900 1.9/36MB 7734 NA 1900 1.9/36MB CPU Range Processor CPW 5250 OLTP CPW MCU 1(3) 1(3) 1200 1200 1200 1200 2600 2600 7352 7350 7357 7355 1900 1900 1.9/36MB 1.
PAGE 351
Table C.7.1.1.
PAGE 352
8. The 64-way is measured as two 32-way partitions since i5/OS does not support a 64-way partition. 9. IBM stopped publishing CIW ratings for iSeries after V5R2. It is recommended that the IBM Systems Workload Estimator be used for sizing guidance, available at: http://www.ibm.com/eserver/iseries/support/estimator C.
PAGE 353
C.8.2 Model 810 and 825 iSeries for Domino (February 2003) Table C.8.2.1. iSeries for Domino 8xx Servers Chip Speed L2 cache Model MHz per CPU 825-2473 (7416) 825-2473 (7416) 1100 1100 1.41 MB 1.41 MB CPU 5250 OLTP Processor CPW Range CPW* 6 4 6600 na 0 0 Processor CIW* MCU 2890 na 17400 11600 4 MB 2 2700 0 950 7900 750 4 MB 1 1470 0 530 4200 750 2 MB 1 1020 0 380 3100 540 *Note: 1.
PAGE 354
Table C.9.2.1 Standard Models 8xx Servers Chip Speed L2 cache Model MHz per CPU CPU Interactive Processor CPW Range CPW Processor CIW MCU 890-2488 890-2488 890-2488 890-2488 890-2488 890-2488 890-2488 890-2488 890-2488 890-2488 (1576) (1577) (1578) (1579) (1581) (1583) (1585) (1587) (1588) (1591) 1300 1300 1300 1300 1300 1300 1300 1300 1300 1300 1.41 MB* 1.41 MB* 1.41 MB* 1.41 MB* 1.41 MB* 1.41 MB* 1.41 MB* 1.41 MB* 1.41 MB* 1.
PAGE 355
C.10.1 Model 8xx Servers Table C.9.1.
PAGE 356
Table C.9.1.
PAGE 357
C.10.4 Capacity Upgrade on-demand Models New in V4R5 (December 2000) , Capacity Upgrade on Demand (CUoD) capability offered for the iSeries Model 840 enables users to start small, then increase processing capacity without disrupting any of their current operations. To accomplish this, six processor features are available for the Model 840. These new processor features offer a Startup number of active processors; 8-way, 12-way or 18-way , with additional On-Demand processors capacity built-in (Standby).
PAGE 358
Table C.10.4.1.
PAGE 359
C.11 V4R5 Additions For the V4R5 hardware additions, the tables show each new server model characteristics and its maximum interactive CPW capacity. For previously existing hardware, the tables show for each server model the maximum interactive CPW and its corresponding CPU % and the point (the knee of the curve) where the interactive utilization begins to increasingly impact client/server performance.
PAGE 360
Model 830-2402 (1534) 830-2402 (1535) 830-2402 (1536) Chip Speed MHz 540 540 540 L2 cache per CPU 4 MB 4 MB 4 MB CPUs Processor CPW Interactive CPW 4 4 4 4200 4200 4200 560 1050 2000 830-2403 830-2403 830-2403 830-2403 830-2403 830-2403 830-2403 (1531) (1532) (1533) (1534) (1535) (1536) (1537) 540 540 540 540 540 540 540 4 MB 4 MB 4 MB 4 MB 4 MB 4 MB 4 MB 8 8 8 8 8 8 8 7350 7350 7350 7350 7350 7350 7350 70 120 240 560 1050 2000 4550 840-2418 840-2418 840-2418 840-2418 840-2418 840-2418 840-2
PAGE 361
C.11.4 SB Models Table C.11.4.1 SB Models Chip Speed Model MHz SB2-2315 540 SB3-2316 500 SB3-2318 500 L2 cache per CPU 4 MB 8 MB 8 MB CPUs Processor CPW* Interactive CPW 8 12 24 7350 10000 16500 70 120 120 * Note: The "Processor CPW" values listed for the SB models are identical to the 830-2403-1531 (8-way), the 840-2418-1540 (12-way) and the 840-2420-1540 (24-way). However, due to the limited disk and memory of the SB models, it would not be possible to measure these values using the CPW workload.
PAGE 362
Model 720-2064 (1504) 720-2064 (1505) Chip Speed MHz 255 255 L2 cache per CPU 4 MB 4 MB 1600 1600 Interactive CPW (Knee) 560 1050 Interactive CPW (Max) 653.3 1225 CPUs Processor CPW 4 4 730-2065 730-2065 730-2065 730-2065 (Base) (1507) (1508) (1509) 262 262 262 262 4 MB 4 MB 4 MB 4 MB 1 1 1 1 560 560 560 560 70 120 240 560 81.7 140 280 653.
PAGE 363
Note: the CPU not used by the interactive workloads at their Max CPW is used by the system CFINTnn jobs. For example, for the 2386 model the interactive workloads use 17.8% of the CPU at their maximum and the CFINTnn jobs use the remaining 82.2%. The processor workloads use 0% CPU when the interactive workloads are using their maximum value. AS/400e Dedicated Server for Domino Table C.12.2.
PAGE 364
C.13 AS/400e Model Sxx Servers For AS/400e servers the knee of the curve is about 1/3 the maximum interactive CPW value. Table C.13.1 AS/400e Servers Model S10 S20 S30 S40 Feature # 2118 2119 2161 2163 2165 2166 2257 2258 2259 2260 2207 2208 2256 2261 CPUs 1 1 1 1 2 4 1 2 4 8 8 12 8 12 Max C/S CPW 45.4 73.1 113.8 210 464.3 759 319 583.3 998.6 1794 3660 4550 1794 2340 Max Inter CPW 16.2 24.4 31 35.8 49.7 56.9 51.5 64 64 64 120 120 64 64 1/3 Max Interact CPW 5.4 8.1 10.3 11.9 16.7 19.0 17.2 21.3 21.
PAGE 365
Table C.15.1 AS/400 Advanced Servers: V4R1 and V4R2 Constrain / Max Model Feature # CPUs Unconstr C/S CPW 2269 c 1 20.2 2269 u 1 27 150 2270 c 1 20.2 2270 u 1 35 2109 n/a 1 27 40S 2110 n/a 1 35 2111 n/a 1 63.0 2112 n/a 1 91.0 50S 2120 n/a 1 81.6 2121 n/a 1 111.5 2122 n/a 1 138.0 2154 n/a 1 188.2 2155 n/a 2 319.0 53S 2156 n/a 4 598.0 2157 n/a 4 650.0 Max Inter CPW 13.8 13.8 20.2 20.6 9.4 14.5 21.6 32.2 22.5 32.2 32.2 32.2 32.2 32.2 32.2 1/3 Max Interact CPW 4.6 4.6 6.7 6.9 3.1 3.9 7.2 10.8 8.1 10.7 12.
PAGE 366
Table C.16.1 AS/400e Custom Application Server Model SB1 SAP SD ds/hr Model CPUs Release @ 65% CPU Utilization 3.1H 109,770.49 2312 8 4.0B 65,862.29 3.1H 158,715.76 2313 12 4.0B 95,229.46 FI ds/hr @ 65% CPU Utilization 274,426.23 164,655.74 396,789.40 238,073.64 C.17 AS/400 Models 4xx, 5xx and 6xx Systems Table C.17.1 AS/400 RISC Systems Model 400 500 510 530 Feature Code CPUs 2130 2131 2132 2133 2140 2141 2142 2143 2144 2150 2151 2152 2153 2162 1 1 1 1 1 1 1 1 1 1 1 2 4 4 Table C.17.
PAGE 367
C.18 AS/400 CISC Model Capacities Table C.18.1 AS/400 CISC Model: 9401 Model Feature CPUs P02 n/a 1 2114 1 P03 2115 1 2117 1 Memory (MB) Maximum 16 24 40 56 Table C.18.2 AS/400 CISC Model: 9402 Systems Model CPUs Memory (MB) Maximum C04 1 12 C06 1 16 D02 1 16 D04 1 16 E02 1 24 D06 1 20 E04 1 24 F02 1 24 F04 1 24 E06 1 40 F06 1 40 Table C.18.3 AS/400 CISC Model: 9402 Servers Feature Code CPUs Memory (MB) Maximum S01 1 56 100 1 56 Disk (GB) Maximum 1.3 1.3 1.2 1.6 2.0 1.6 4.0 2.1 4.1 7.9 8.
PAGE 368
Table C.18.6 AS/400 CISC Model: 9406 Systems Model CPUs Memory (MB) Maximum B30 1 36 B35 1 40 B40 1 40 B45 1 40 D35 1 72 B50 1 48 E35 1 72 D45 1 80 D50 1 128 E45 1 80 F35 1 80 B60 1 96 F45 1 80 E50 1 128 B70 1 192 D60 1 192 F50 1 192 E60 1 192 D70 1 256 E70 1 256 F60 1 384 D80 2 384 F70 1 512 E80 2 512 E90 3 1024 F80 2 768 E95 4 1152 F90 3 1024 F95 4 1280 F97 4 1536 Table C.18.