Common Misconfigured HP-UX Resources By: Mark Ray, Global Solutions Engineering Steven Albert, Global Systems Engineering Jan Weaver, Global Systems Engineering Overview............................................................................................................................................ 3 The HFS Inode Cache .......................................................................................................................... 4 What is an Inode Cache? ..............................
The JFS Metadata Buffer Cache ........................................................................................................... 27 What is Metadata? ........................................................................................................................ 27 The Metadata Cache: Dynamic or Static? ......................................................................................... 28 The Metadata Cache and Memory....................................................................
Overview Physical memory is a finite resource. It is also a shared resource with many processes attempting to access this finite resource. Not only do processes need memory in order to run, the HP-UX operating system (or kernel) also needs spaces for its critical resources and tables. Some of these resources are static (do not change in size) and some are dynamic. Many of these resources can be configured to be a certain size or configured to be limited by a certain value.
The HFS Inode Cache With the introduction of the Journaled File System (JFS), many systems now use the High Performance File System (HFS) for the boot file system (/stand) only. Since the JFS inode cache is managed separately from the HFS inode cache, you may need to adjust the size of your HFS inode cache.
The size of the cache is determined by the ninode tunable. Note that the tunable only sizes the HFS inode cache, and does not affect the JFS inode cache. The tunable also sizes the HFS inode hash table, which is the previous “power of 2” based on ninode. For example, if the ninode tunable is configured for 1500 inodes, then the hash table will have 1024 hash entries (since 1024 is the previous power of 2 to 1500).
The dependency on the ninode tunable is reduced with the introduction of the following two tunables: • ncsize — Introduced with PHKL_18335 on HP-UX 10.x. Determines the size of the Directory Name Lookup Cache independent of ninode. • vx_ncsize — Introduced in HP-UX 11.0. Used with ncsize to determine the overall size of the DNLC.
The HP-UX Buffer Cache The HP-UX buffer cache configuration can be confusing, and the HP-UX buffer cache is frequently over or under configured. Understanding how the HP-UX buffer cache is maintained and used can help you determine the proper configuration for your application environment.
dbc_min_pct 5 The dbc_min_pct tunable cannot be less than 2 and dbc_max_pct cannot be greater than 90. The dbc_min_pct and dbc_max_pct tunables are dynamic on HP-UX 11i v2 and can be modified without a system reboot. When the system is initially booted, the system allocates dbc_min_pct (the default is 5 percent) of memory for buffer pages (each page is 4,096 bytes). The system also allocates one buffer header for every two buffer pages.
The following figure shows the HP-UX buffer cache. Note that buffers remain in the buffer cache even after a file is closed. Thus, if a file is reopened a short time later, the buffers may still be available in the buffer cache. For example, if your buffer cache is 500 MB in size, and you enter a grep command on a 100-MB file, each data block will need to be read into the buffer cache as the file is scanned.
Buffer Cache Hash Table Blocks in the buffer cache are hashed so that they can be accessed quickly. The number of hash entries is computed at boot time and is one-quarter of the number of free memory pages rounded up to the nearest power of two. Therefore, a system with 12 GB of memory will have approximately one million hash table entries regardless of the buffer cache configuration. A page is 4,096 bytes, thus 12 GB represents 3,145,728 pages.
come into play when multiple HFS file systems of different block sizes are used, or multiple JFS file systems are mounted with different max_buf_data_size settings, or with NFS file systems mounted with a read/write size other than 8 K. In such caches, increase the bcvmap_size_factor parameter to at least 16. Refer to the vxtunefs(1M) manpage for more information on the max_buf_data_size parameter.
• Read ahead If file system access is generally sequential, the buffer cache provides enhanced performance via read ahead. When the file system detects sequential access to a file, it begins doing asynchronous reads on subsequent blocks so that the data is already available in the buffer cache when the application requests it. For HFS file systems, general sequential reads are configured via the hfs_ra_per_disk system tunable.
• Flushing the buffer cache: the syncer program The syncer program is the process that flushes delayed write buffers to the physical disk. Naturally, the larger the buffer cache, the more work that must be done by the syncer program. The HP-UX 11.0 syncer is single threaded. It wakes up periodically and sequentially scans the hash table for blocks that need to be written to the physical device.
may require four I/O requests. However, if the buffer cache is bypassed, a single 256 KB direct I/O could potentially be performed. • Data accessed once Management of the buffer cache requires additional code and processing. For data that is accessed only once, the buffer cache does not provide any benefit for keeping the data in the cache. In fact, by caching data that is accessed only once, the system may need to remove buffer pages that are more frequently accessed.
Some databases use the async driver (/dev/async), which performs asynchronous I/O, but bypasses the buffer cache and reads directly into shared memory segments. Be careful not to mix buffered I/O and direct I/O. This results in increased overhead to keep the direct and buffered data in sync. Buffer Cache Guidelines Providing general guidelines for tuning the buffer cache is very difficult. So much depends on the application mix that is running on the system, but some generalizations can be made.
The JFS Inode Cache HP-UX has long maintained a static cache in physical memory for storing High Performance File System (HFS) file information (inodes). The VERITAS File System (HP OnlineJFS/JFS) manages its own cache of file system inodes which will be referred to here as the JFS inode cache. The JFS inode cache is managed much differently than the HFS inode cache. Understanding how the JFS inode cache is managed is key to understanding how to best tune the JFS inode cache for your unique environment.
The JFS Inode Cache is a Dynamic Cache Unlike the High Performance File System (HFS), the JFS inode cache is a dynamic cache. A dynamic cache is a cache that grows and shrinks based on need. As files are opened, the number of inodes in the JFS inode cache grows. As files are closed, they are moved to a free list and can be reused at a later time. However, if the inode is inactive for a certain period of time, the inode is freed and space is return to the kernel memory allocator.
If you are using JFS 3.5 and above, you can use the vxfsstat command to display the maximum number of inodes in the inode cache as follows: # vxfsstat / | grep inodes 3087 inodes current 255019 inodes alloced 128002 peak 251932 freed # vxfsstat -v / | grep maxino vxi_icache_maxino 128000 128000 maximum vxi_icache_peakino 128002 Note that in the previous example the inode cache can handle a maximum of 128,000 inodes.
JFS Inode Free List inode inode inode inode inode inode inode inode JFS uses a number of factors to determine the number of free lists on the system, such number of inodes, maximum number of processors supported, etc. The number of free lists is not tunable and can vary from one JFS version to the next. The number of free lists can be identified with the following adb command: # echo “vx_nfreelists/D” | adb –k /stand/vmunix /dev/mem # 11.
128,000 inodes would be in the cache. The first 1000 inodes would have been reused when reading in the last 1000 inodes. If you use the find command again, it would have to recache all 129,000 inodes. However, if the find command only traversed through 127,000 inodes, then all of the inodes would be in the cache for the second find command, which would then run much faster. As an example, consider an HP-UX 11i v1 system using JFS 3.5.
# vxfsstat -v / | grep ifree vxi_icache_recycleage 1035 vxi_ifree_timelag 1800 Consider the JFS 3.5 case again. The system begins to free the inodes if they have been inactive for 30 minutes (1800 seconds) or more.
You can use the following table to estimate the memory cost of each JFS inode in the inode cache (measured in bytes). Each item reflects the size as allocated by the kernel memory allocator: JFS 3.3 11.11 32-bit JFS 3.3 11.11 64-bit JFS 3.5 11.11 64-bit JFS 3.5 11.23 JFS 4.1 11.23 JFS 5.1 11.23 JFS 4.1 11.
There are multiple object sizes available but not all sizes are represented. For example, on HP-UX 11.0 there are pages that use the 1024-byte object and the 2048-byte object, but nothing in between. If JFS requests 1040 bytes of memory, then an entire 2048-byte object is taken from a page divided into two 2048-byte objects. As inodes are allocated, the kernel will allocate memory pages and divide them into objects, which are then used by the JFS subsystem for inodes and associated data structures.
Inode Allocation with JFS 4.1 and later Beginning with JFS 4.1, the inode algorithms change to allocate inodes more efficiently. Rather than allocating one 4 Kb page which is used to allocate JFS inodes, a larger “chunk” of memory was used. The inode allocation chunk size is 16Kb. Currently, JFS can carve out 11 inodes out of a 16 Kb chunk. JFS Inode Free List Inode Chunk Inode 1 Inode 2 Inode 3 Inode 4 Inode 5 Inode 6 Inode 7 16 Kb Inode 8 Inode 9 Inode 10 Inode 11 unused So on JFS 4.
You can tune the maximum size of the JFS inode cache using the vx_ninode tunable. With JFS 4.1 on HP-UX 11i v2, vx_ninode can be tuned dynamically using kctune. At a minimum, you must have at least one JFS inode cache entry for each file that is opened at any given time on your system. If you are concerned about the amount of memory that JFS can potentially take, then try to tune vx_ninode down so that the cache only takes about1-2 percent of overall memory.
systems so the vxfsd daemon does not use too much CPU. The following example uses kctune to change vxfs_ifree_timelag without a reboot: # kctune vxfs_ifree_timelag=-1 : Tunable Value vxfs_ifree_timelag (before) 0 (now) -1 Summary : Expression Default -1 Changes Immed Deciding whether or not to tune the JFS inode cache depends on how you plan to use your system. Memory is a finite resource, and a system manager needs to decide how much of the system’s memory needs to be spent on a specific resource.
The JFS Metadata Buffer Cache In past releases of the VERITAS File System (HP OnlineJFS/JFS) prior to Journaled File System (JFS) 3.5, the metadata of a file system was cached in the standard HP-UX buffer cache with all of the user file data. Beginning with JFS 3.5 introduced on HP-UX 11i v1, the JFS metadata was moved to a special buffer cache known as the JFS metadata buffer cache (or metadata cache). This cache is managed separately from the HP-UX buffer cache.
The Metadata Cache: Dynamic or Static? The metadata cache is a dynamic buffer cache, which means it can expand and shrink over time. It normally expands during periods of heavy metadata activity, especially with operations that traverse a large number of inodes, such as a find or backup command. Simply reading a large file may fill up the HP-UX buffer cache, but not the metadata cache.
the cache, even if all the metadata is brought into the cache. Use the vxfsstat command to see how much metadata is in the cache and what the maximum size is for buffer pages. Note that memory usage from the table accounts for both buffer pages and buffer headers. It does not include other overhead for managing the metadata cache, such as the hash headers and free list headers.
However, the memory cost must be considered. If the system is already running close to the low memory threshold, the increased memory usage can consume memory that could potentially be used for other applications, potentially degrading the performance of other applications. Note that file systems with a large number of small files can have much more metadata than a larger file system with a smaller amount of large files. There is no way to predict how much metadata will be brought into the cache.
Semaphores Tables Many third-party applications, databases in particular, make extensive use of the semaphores (commonly referred to as System V IPC) available in HP-UX 11i v1 and higher. The installation guides of third party products often recommend changing the tunables associated with semaphores. The purpose of this document is to describe the memory impacts that are caused by changing these tunables.
Tunables Below are the tunables related to semaphores, which are described in the following table: Tunable Description sema Enable or disable System V IPC semaphores at boot time semaem Maximum cumulative value changes per System V IPC semop() call semmni Number of System V IPC system-wide semaphore identifiers semmns Number of System V IPC system-wide semaphores semmnu Maximum number of processes that can have undo operations pending semmsl Maximum number of System V IPC semaphores per identif
The semmni tunable sizes the semaphore ID table that maintains the system-wide semaphore identifiers. Each entry in the array is 96 bytes, so each increment to semmni will increase the boot time kernel size by 96 bytes. The semmsl tunable defines the maximum number of semaphores per semaphore identifier. It does not size a specific static table. Each semaphore is 8 bytes. The semmns tunable keeps track of the system-wide semaphores.
Use the following formula to calculate the size of the system semaphore undo table: ((24+(8*semume)) * semmnu) Keep this formula in mind when changing these tunables since the effects on the boot time kernel size are multiplicative. How many undo structures does one need? There is no single answer to that, but it is important to note that undo structures are only allocated if the application specifies the SEM_UNDO flag.
Semaphore undo table semmnu 24 bytes + (8*semume) 256 (semmnu) 100 (semume) Changes in HP-UX 11i v2 Beginning with the HP-UX 11i v2, the dynamic changes described in the previous section define the default behavior; therefore, no patch is necessary. The sema tunable has been obsoleted. The sizes of the data structures remain the same, so all of the formulas for calculating the kernel boot size remain the same.
For more information http://docs.hp.com HP technical documentation Web site. © 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.