dave olker systems networking solutions lab nfs performance tuning for hp-ux 11.0 and 11i systems revision 2.0 july 22, 2002 July 22, 2002 Notes: Page 1 Copyright 2002 Hewlett-Packard Company Networked File System (NFS) has been the industry standard protocol for remote file access on the UNIX operating system platform for many years. It has become a critical component of every flavor of UNIX, as well as many non-UNIX based operating systems. NFS is also a central component of HP’s NAS offerings.
• Environmental Considerations • Daemons and Kernel Threads agenda (part one) • Automount & AutoFS • CacheFS • NFS PV2 vs. PV3 • NFS/UDP vs. NFS/TCP July 22, 2002 Notes: Page 2 Copyright 2002 Hewlett-Packard Company An important step in tuning HP-UX systems for NFS performance is to evaluate the entire environment in which the client and server systems reside. The underlying network, local filesystems, and operating system patch levels can heavily influence throughput results.
• NFS Mount Options • Buffer Cache agenda (part two) • Kernel Parameter Tuning • Summarize differences between HP-UX 11.0 & 11i NFS implementations • Summarize Recommendations July 22, 2002 Notes: Page 3 Copyright 2002 Hewlett-Packard Company There are many NFS-specific mount options available. Some of these options can have a positive impact on performance, while others can have a dramatically negative effect. It is important to know which options to use and which to avoid.
• Network environmental considerations • Local Filesystems • OS Patching • Hostname Resolution July 22, 2002 Notes: Page 4 Copyright 2002 Hewlett-Packard Company NFS is essentially a network-based application that runs on top of an operating system, such as HP-UX. Like most applications, NFS competes for resources (such as disk, network, memory, and kernel tables) with the other processes on the system.
• Analyze Network Layout network considerations • Measure Network Throughput Capabilities • Network Troubleshooting July 22, 2002 Notes: Copyright 2002 Hewlett-Packard Company Page 5 Since NFS is an acronym for “Network File System”, it should come as no surprise that NFS performance is heavily dependent upon the latency and bandwidth of the underlying network.
Analyze Network Layout network • Familiarize yourself with the physical layout (i.e. how many network hops separate the client and server?) Ø OpenView Network Node Manager Ø traceroute Ø ping -o • MTU sizes of the various network hops Ø netstat -in July 22, 2002 Notes: Copyright 2002 Hewlett-Packard Company Page 6 An important early step in troubleshooting any NFS performance issue is to learn as much as possible about the physical layout of the underlying network topology. How many network hops (i.
Measure Network Throughput network • Generally speaking, the higher your network throughput, the better your NFS performance will be • Eliminate the NFS layer from consideration (if a throughput problem exists between an NFS client and server, the problem should affect any IP protocol) July 22, 2002 Notes: Ø ttcp (http://ftp.arl.mil/ftp/pub/ttcp) Ø netperf (http://www.netperf.
ttcp network The above ttcp output shows this NFS server can send 80MB of TCP/IP data to the client’s discard port (9) in 1.35 seconds (approximately ~59MB/sec). July 22, 2002 Notes: Page 8 Copyright 2002 Hewlett-Packard Company ttcp is a simple, lightweight program that measures the throughput of any network connection without relying on the filesystem layer.
netperf network The above netperf output shows this NFS server was able to send TCP/IP data to the client at a sustained rate of ~59.5MB/sec during the 10 second test. July 22, 2002 Notes: Copyright 2002 Hewlett-Packard Company Page 9 Netperf is a benchmark utility that can measure the performance of many different types of networks. Like ttcp, netperf measures throughput without relying on any filesystem resources.
Network Troubleshooting Tools network • Determine if a suspected network throughput problem affects all IP traffic or only NFS • Eliminate the NFS layer from consideration by using tools that report on the health of the transport and link layers Ø netstat(1) Ø lanadmin(1M) July 22, 2002 Notes: Copyright 2002 Hewlett-Packard Company Page 10 The goal for this phase of the investigation is to determine if a network throughput problem is affecting all IP traffic or only NFS.
Network Troubleshooting Checklist network • Network throughput problems are usually caused by packets being dropped somewhere on the network Ø Defective hardware – network interfaces, cables, switch ports, etc. Ø Mismatching configuration settings – make sure interface cards settings match network switch – speed settings, duplex (i.e. half vs.
• Analyze Filesystem Layout • Measure Filesystem Throughput Capabilities local filesystem considerations • Filesystem Tuning Recommendations July 22, 2002 Notes: Copyright 2002 Hewlett-Packard Company Page 12 The performance of the local filesystems (both on the NFS client and server) can have just as big an impact to overall NFS performance as the underlying network. Again, this shouldn’t be a surprise since NFS is an acronym for “Network File System”.
Analyze Filesystem Layout local filesystems • The layout of the directory hierarchy and the directory contents on the NFS server can affect performance • Directory reading and traversal speeds can be influenced by the contents of the directories being searched Ø Number of files in the directory Ø Number of symbolic links in the directory Ø Symbolic links pointing to automounted directories July 22, 2002 Notes: Page 13 Copyright 2002 Hewlett-Packard Company When retrieving the contents of an NFS mount
Large Directory Dilemma local filesystems Customer Reported Problem • V-Class system running HP-UX 11.
Measure Filesystem Throughput local filesystems • Generally speaking, the higher your local filesystem throughput is, the better your NFS performance will be • Eliminate the NFS layer from consideration (if a filesystem throughput problem exists it should affect any I/O traffic) • Use tests that don’t require filesystem resources to run Ø iozone (http://www.iozone.
iozone local filesystems VxFS Normal Read 400000-450000 Throughput (KBytes/Second) 450000 350000-400000 400000 300000-350000 350000 250000-300000 200000-250000 300000 150000-200000 250000 100000-150000 50000-100000 200000 150000 100000 16384 50000 1024 July 22, 2002 Notes: 65536 262144 File Size (KBytes) 16384 4096 64 1024 256 64 0 4 Record Size (KBytes) Copyright 2002 Hewlett-Packard Company Page 16 One of the better filesystem performance benchmark utilities available is io
dd(1) local filesystems Difference between 11.0 and 11i “/dev/zero” is delivered with HP-UX 11i but not with HP-UX 11.0. It can be created with the command “mknod /dev/zero c 3 4” server(/big) -> timex dd if=/big/40gig of=/dev/zero bs=32k 1250000+0 records in 1250000+0 records out real user sys 5:36.58 4.90 5:28.01 Server Local File System Read Performance = 121.
Filesystem Recommendations (part 1) local filesystems • Use VxFS filesystems whenever possible • Use block size of 8KB if possible, otherwise use 1KB • When using VxFS 3.3, tune with vxtunefs(1M), otherwise use VxFS 3.1 • Specify 16MB logfile size via “mkfs –o logsize” i.e. 2048 (8KB block size) or 16348 (1KB block size) July 22, 2002 Notes: Page 18 Copyright 2002 Hewlett-Packard Company VxFS filesystems are recommended over HFS filesystems on both NFS clients and servers.
Filesystem Recommendations (part 2) local filesystems • Mount filesystems with “–o delaylog” if possible • Monitor fragmentation via the fsadm –D –E command • De-fragment filesystems via the fsadm –d –e command • Monitor filesystem utilization via the bdf(1M) command Ø don’t let critical filesystems get below 10% free space • Enable immediate disk reporting via scsictl(1M) July 22, 2002 Notes: Copyright 2002 Hewlett-Packard Company Page 19 The “–o delaylog” mount option should be specified if your e
• Performance Enhancing Defect Fixes • Performance Enhancing Functionality Added OS patching considerations • Dependent Patches • Dependent Subsystems July 22, 2002 Notes: Copyright 2002 Hewlett-Packard Company Page 20 NFS continues to evolve over time. The ONC+ offering that shipped on HP-UX 11i is a far superior product in terms of functionality, stability, and performance compared to the ONC+ product that shipped with the HP-UX 11.0 release.
Performance Enhancing Defect Fixes patching • Most NFS mega-patches contain some defect fixes that directly impact performance, for example: Ø JAGad14221 Client performance is degraded as shown by "nfsstat -c", it makes unnecessary GETATTR calls for each read or write on files opened with synchronous I/O flags set; and synchronous I/O mode remains in effect for subsequent opens on an NFS file opened once with synchronous I/O flags set.
Performance Enhancing Functionality Added in Patches patching • Many NFS components have been added to the HP-UX 11.0 release after the original release, including: Ø AutoFS Ø NFS over TCP/IP Ø NFS PV3 read and write buffer sizes increased from a maximum of 8KB to 32KB July 22, 2002 Notes: Copyright 2002 Hewlett-Packard Company Page 22 In addition to defect fixes, on occasion the NFS lab includes new functionality in their mega-patches.
Dependent Patches for 11.0 NFS (as of 4/12/2001) • PHKL_18543 – Kernel Bubble • PHCO_23770 – libc • PHKL_20016 – Hardware Fix • PHKL_23002 – pthread • PHCO_22269 – SAM • PHCO_23092 – libc header • PHNE_22397 – ARPA • PHCO_23117 – bdf(1M) • PHKL_22432 • PHKL_23226 – syscalls • PHNE_22566 – STREAMS • PHKL_23628 – shmem • PHKL_22589 – LOFS • PHCO_23651 – fsck_vxfs • PHKL_22840 – syscalls • PHCO_19666 – libpthread July 22, 2002 Notes: patching – VxFS 3.
Dependent Subsystems • Underlying Filesystems – VxFS, HFS, CDFS, LOFS • Commands – mount, umount, bdf, df, ls, etc. • Libraries – libc, libpthread, etc. • Buffer Cache July 22, 2002 Notes: patching • Kernel Bubble • LAN Common • Network Link Specific – Ethernet, Gigabit, Token Ring, ATM, etc. • Network Transport • SAM Copyright 2002 Hewlett-Packard Company Page 24 It is very important to keep the patch levels of all the underlying subsystems (i.e. network, buffer cache, transport, etc.
• What is it and why should you care about it? • Which hostname resolution mechanisms are used in your environment? • Are the hostname resolution servers responding quickly to requests? hostname resolution • Do the servers return accurate data? July 22, 2002 Notes: Copyright 2002 Hewlett-Packard Company Page 25 At some point, nearly every component of NFS needs to resolve an IP address to a hostname, or vice versa.
What is hostname resolution and why do I care about it? hostname resolution • Hostname resolution is the mapping of a computer’s hostname to its network address (typically an IP address) and vice versa. • At some point, most every component of NFS needs to resolve a hostname or IP address Ø NFS mount(1M) command Ø rpc.mountd Ø rpc.lockd & rpc.
What hostname resolution mechanism(s) do you use? hostname resolution • Verify which hostname resolution service(s) used in your environment by checking the “hosts” entry in the /etc/nsswitch.
Are the hostname resolution servers responding quickly to requests? hostname resolution • Any latency involved in retrieving hostname or IP address information can negatively impact NFS performance • Verify that lookup requests are resolved quickly Ø nslookup(1) – Supports DNS, NIS, /etc/hosts – Doesn’t support NIS+ Ø nsquery(1) – Supports DNS, NIS, NIS+, /etc/hosts July 22, 2002 Notes: Copyright 2002 Hewlett-Packard Company Page 28 HP-UX supports many different directory service back-ends for hostn
Do the hostname resolution servers respond with accurate information? hostname resolution • Even when the repository servers are responding quickly, if they don’t contain the requested information, then NFS behavior and performance can be severely impacted • Do your repository servers contain information about every NFS client and server system in your environment? • Is the hostname-to-address information up-to-date? • Verify with nslookup(1) and nsquery(1) July 22, 2002 Notes: Copyright 2002 Hewlett-Pa
• biod user-space daemons and kernel threads • nfsd • rpc.mountd • rpc.lockd & rpc.statd July 22, 2002 Notes: Copyright 2002 Hewlett-Packard Company Page 30 NFS is a true “client-server” application, yet most of the tuning and troubleshooting information currently available (i.e. white papers, web pages, etc.) pertains only to NFS servers.
• What are they? • How do they work? in the READ and WRITE cases? • Why not just launch hundreds of biods? biod daemons • Why would a client perform better with no biods running? • How many biod daemons should your client run? • Troubleshooting July 22, 2002 Notes: Copyright 2002 Hewlett-Packard Company Page 31 This section describes how the biod daemons work in both the read and write scenarios. It explains why running a large number of biods does not necessarily result in better performance.
What are biod daemons? biod Difference between 11.0 and 11i Default number of biods on 11.
How do they work in the READ case? biod GOAL: Keep the client’s buffer cache populated with data and avoid having the client block waiting for data from the server 1. Process checks buffer cache. If data is present it’s read from cache. 2. If data is not in the cache the process makes the initial NFS READ call and sleeps waiting for the data to arrive from the NFS server. 3. When the data arrives it is placed in client’s buffer cache. The sleeping process is awoken and reads from the cache. 4.
How do they work in the WRITE case? biod GOAL: Keep the client’s buffer cache drained so that when a writing process flushes a file it only needs to block a short amount of time while any remaining data is posted to the NFS server 1. Process writes data to buffer cache. If biods are not running, the writing process sends the data to the server and waits for a reply. 2. If biods are running, the writing process does not block.
Why not just launch hundreds of biods and be done with it? biod Difference between 11.0 and 11i Filesystem semaphore contention drastically reduced in 11i • 11.0 client’s NFS read() and write() paths use the global filesystem semaphore to protect key kernel data structures and I/O operations • Acquiring the filesystem semaphore effectively locks out all other filesystem related operations on the system – not just other NFS requests but requests for all filesystems (VxFS, HFS, CDFS, etc.).
Why would an NFS client perform better with NO biods running? biod GOAL: To achieve optimal throughput, maximize the number of simultaneous requests “in flight” to the NFS servers • If biods are running then the number of biods roughly defines the maximum number of simultaneous outstanding requests possible. • If NO biods are used, the number of processes simultaneously reading from or writing to the NFS-mounted filesystems roughly defines the maximum number of outstanding requests possible.
Why else would you consider disabling biods on your NFS client? biod • Avoid blocking access to all servers when one is down Ø All biods can block on a dead server and hang access to any working servers • Applications performing mainly non-sequential reads Ø Read-ahead data will likely be ignored – wasted overhead • Applications performing primarily synchronous I/O Ø Biods are disabled for synchronous I/O requests • Relax 25% buffer cache limitation for asynchronous I/O Ø This buffer cache limitation i
How many biods should your NFS client run? biod Recommended INITIAL Value: NUM_NFSIOD=16 • Starting too few biods can result in poor read/write performance • Starting too many can lead to semaphore contention on 11.
Troubleshooting biods (part 1) biod • NFS Client Panics Ø Analyze the crash dump with Q4 to determine root cause • NFS Client Application Hangs Ø Look for biod traffic in a network trace Ø Examine the running biod daemons on the live system with Q4 Ø Monitor “nfsstat –c” output for biod traffic Ø When all else fails, TOC the system and analyze dump with Q4 July 22, 2002 Notes: Page 39 Copyright 2002 Hewlett-Packard Company NFS Client Panics Analyzing HP-UX system dumps is a complex and involved proc
Troubleshooting biods (part 2) biod • Poor NFS Application Performance Ø Monitor nfsstat –c output for potential performance problems Ø Use nfsstat –m output to monitor smooth round trip times Ø Look for delays or retransmissions in a network trace Ø Use tusc utility to look for delays at the system call level ftp://ftp.cup.hp.com/dist/networking/misc/tusc.
• What are they? • How do they work in the READ and WRITE cases? • Why are more nfsds launched than configured in NUM_NFSD? • How many threads service TCP requests? nfsd daemons and threads • How many nfsds should you run? • Troubleshooting July 22, 2002 Notes: Copyright 2002 Hewlett-Packard Company Page 41 This section discusses the various daemons and threads that handle server-side NFS requests. It describes how these daemons and threads work in both the read and write scenarios.
What are the various “nfsd” daemons and kernel threads? nfsd Difference between 11.0 and 11i Default UDP nfsds on 11.0 = 4 Default on 11i = 16 • nfsd Services NFS/UDP requests and manages NFS/TCP connections • nfsktcpd Services NFS/TCP requests • nfskd Currently serves no useful purpose July 22, 2002 Notes: Copyright 2002 Hewlett-Packard Company Page 42 nfsd The nfsd daemons are primarily used to service NFS/UDP requests.
How do they work in the READ case? nfsd GOAL: Retrieve the data requested by the NFS clients from the server’s buffer cache or physical disks as quickly as possible 1. The nfsd checks the server’s buffer cache for the requested data. If data is present it’s read from cache and returned to the client. 2. If data is not in the cache the nfsd schedules a READ call to the underlying filesystem (VxFS, HFS, CDFS, etc.) and sleeps waiting for the data. 3.
How do they work in the WRITE case? nfsd GOAL: Post the data to the server’s buffer cache as quickly as possible and allow the client to continue. Flush the data to physical disk in the background as quickly as possible. 1. Determine if the client is writing in synchronous or asynchronous mode 2. Synchronous – schedule the WRITE to the underlying filesystem, sleep until the call completes, then wake up and send a reply to the client 3.
Why are more nfsds launched than configured in NUM_NFSD? nfsd • Number of NFS/UDP daemons must be equally divisible by the number of CPUs due to processor affinity • If NFS/TCP is enabled – one additional nfsd is launched Example – 8 CPU system, NFS/TCP enabled, 100 nfsds requested – you will actually get 105 nfsds 100 (requested nfsds) / 8 (CPUs) = 12.
What happens at nfsd start time? nfsd 1. Stream Head Buffer Size Calculated 4. Number of nfsds Increased if Needed 2. Per-CPU nfsd Pools Created 5. Stream Head Replicated per-CPU 3. rpcbind(1M) Bindings Established 6. nfsds Bind to per-CPU Pools July 22, 2002 Notes: Page 46 Copyright 2002 Hewlett-Packard Company When the nfsds are started, several important things happen: 1. Stream Head Buffer Size Calculated The size of the stream head buffer is calculated based on the number of nfsds launched.
What happens when an NFS request arrives on the server? nfsd • The kernel determines which per-CPU stream head to queue the request on • A single nfsd is awoken to handle the request • CPUs can “task steal” – if an nfsd is ready to execute but the CPU it is bound to is busy, a different CPU may “steal” the waiting nfsd and execute it • nfsds can “task steal” – an nfsd may steal requests from other CPU’s queues July 22, 2002 Notes: Copyright 2002 Hewlett-Packard Company Page 47 When an NFS request arri
Which nfsd is used by NFS/TCP? nfsd The nfsd process used by NFS/TCP has a parent process ID of 1 (init) and has no child processes. In this example – TID 1829 (PID 1809). July 22, 2002 Notes: Page 48 Copyright 2002 Hewlett-Packard Company Looking at the above screen output, you see the entire list of daemons and kernel threads used for servicing NFS/UDP and NFS/TCP requests. The nfsds responsible for servicing UDP requests are using TID numbers 1833, 1834, 1836, and 1840.
How many nfsktcpd kernel threads service NFS/TCP requests? nfsd • The NUM_NFSD variable has no effect on NFS/TCP • By default, the NFS server launches a maximum of 10 kernel threads for each NFS/TCP connection it receives • The threads launched for a specific TCP connection will only service the requests that arrive on that connection • By default, HP NFS/TCP clients only open a single TCP connection to each NFS server, regardless of the number of filesystems mounted from the server July 22, 2002 Notes:
Can you change the NFS/TCP “single connection” default behavior? nfsd WARNING WARNING WARNING Ø The following procedure is NOT SUPPORTED BY HP Ø This procedure should be used with caution, as it will modify the client’s behavior when mounting filesystems from any NFS/TCP server, not just HP servers.
Can you change the default maximum of 10 threads/connection? nfsd WARNING WARNING WARNING Ø The following procedure is NOT SUPPORTED BY HP Ø This procedure should be used with caution, as it will modify the server’s behavior when servicing NFS/TCP mounts from any NFS/TCP client, not just HP clients.
How many UDP nfsds should your NFS server run? nfsd Recommended INITIAL Value: NUM_NFSD=64 • NUM_NFSD only affects the number of NFS/UDP daemons, so tuning NUM_NFSD depends on how much NFS traffic arrives via UDP • Starting too few nfsds can result in poor read/write performance, and in rare cases nfsd deadlock situations (with loopback NFS mounts) • Starting too many can result in directory metadata contention • Better to start too many than too few • Your mileage may vary so it is important to measure p
Troubleshooting nfsds (part 1) nfsd • NFS Server Panics Ø Analyze the crash dump with Q4 to determine root cause • NFS Application Hangs Ø Use rpcinfo(1M) command to “ping” nfsd daemons/threads Ø Look for nfsd/nfsktcpd traffic in a network trace Ø Examine the nfsd daemons/threads on the live system with Q4 Ø Monitor “nfsstat –s” output for nfsd/nfsktcpd traffic Ø When all else fails, TOC the system and analyze dump with Q4 July 22, 2002 Notes: Page 53 Copyright 2002 Hewlett-Packard Company NFS Serve
Troubleshooting nfsds (part 2) nfsd • Poor NFS Application Performance Ø Monitor nfsstat -s output for potential performance problems Ø Look for delays or retransmissions in a network trace Ø Use netstat –p udp utility to look for UDP socket overflows potentially occurring on port 2049 – a network trace would also need to be consulted to verify whether “ICMP source quench” packets are being sent from the server for port 2049 Ø Use kernel profiling tools, such as kgmon, to understand where the server’s ker
• What is it? • What factors influence rpc.mountd performance? rpc.mountd • Troubleshooting July 22, 2002 Notes: Copyright 2002 Hewlett-Packard Company Page 55 This section describes the rpc.mountd daemon (commonly referred to as simply “mountd”), which is used to mount NFS filesystems. Included is a discussion of the ways you can tune your environment for optimal rpc.mountd performance, as well as some recommendations for troubleshooting NFS filesystem mounting problems.
What is rpc.mountd? rpc.
What factors influence rpc.mountd performance? rpc.mountd • The choice of repository used to provide hostname resolution data (i.e.
Troubleshooting rpc.mountd rpc.mountd • Use rpcinfo(1M) command to “ping” rpc.mountd • Collect a debug-level rpc.mountd logfile via the SIGUSR2 toggle mechanism • Verify that hostname resolution servers (i.e. DNS, NIS, etc.) are responding and return accurate data • Collect a network trace of the problem • Determine if the undocumented rpc.
• What are they? • How are lock requests handled? rpc.lockd & rpc.statd • How are locks recovered after a system reboot? • Avoiding NFS lock hangs • Ensuring optimal lock performance • Troubleshooting July 22, 2002 Notes: Copyright 2002 Hewlett-Packard Company Page 59 This section describes how the rpc.lockd and rpc.statd daemons make up the Network Lock Manager (NLM) protocol. It discusses how lock requests are processed, and how locks are recovered after server failures.
What are rpc.lockd and rpc.statd? rpc.lockd & rpc.statd • Implement the Network Lock Manager (NLM) Protocol, providing NFS file locking semantics • rpc.lockd handles lock requests • rpc.
How are NFS file lock requests handled by lockd and statd? rpc.statd 3 rpc.statd 4 14 7 6 13 rpc.lockd 2 8 11 kernel text 16 application application 12 Notes: 10 kernel text July 22, 2002 9 rpc.lockd 15 1 5 rpcbind rpcbind CLIENT SERVER Copyright 2002 Hewlett-Packard Company 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. rpc.lockd & rpc.
How are NFS file locks recovered after a client or server reboot? 17 rpc.statd 3 4 rpc.statd 14 7 6 13 rpc.lockd 2 8 11 kernel text 16 application application 12 Notes: 10 kernel text July 22, 2002 9 rpc.lockd 15 1 rpc.lockd & rpc.statd 5 rpcbind rpcbind CLIENT SERVER Page 62 Copyright 2002 Hewlett-Packard Company 17. Notification process takes place between client’s and server’s rpc.
Avoiding NFS File Lock Hangs rpc.lockd & rpc.statd • Make sure hostname resolution data is accurate (i.e. make NFS server can correctly resolve IP address of the client – even if client is in a remote DNS domain) • Remove corrupted files from /var/statmon/sm.bak • Never remove files from the /var/statmon/sm directory on only a client or server system • Use the rpc.
Ensuring Optimal NFS File Locking Performance rpc.lockd & rpc.statd • Verify that hostname resolution servers (i.e. DNS, NIS, etc.) are responding and return accurate data • Remove obsolete files from /var/statmon/sm.bak to avoid forcing rpc.statd to continuously try contacting systems which no longer exist in the environment July 22, 2002 Notes: Copyright 2002 Hewlett-Packard Company Page 64 Even when NFS file locking is functionally working, hostname resolution still plays a key role in rpc.
Troubleshooting rpc.lockd & rpc.statd rpc.lockd & rpc.statd • Use rpcinfo(1M) command to “ping” lockd & statd • Collect debug-level rpc.lockd and rpc.
• What are they? • How are they different from each other? automount & autofs • Performance Considerations • Should you use Automount or AutoFS in your environment? • Troubleshooting July 22, 2002 Notes: Copyright 2002 Hewlett-Packard Company Page 66 Automount and AutoFS are generally not considered daemons used for performance reasons.
What are Automount and AutoFS? automount & autofs Difference between 11.0 and 11i AutoFS did not ship with HP-UX 11.0 – it required Extension Pack 9808 (August 1998). AutoFS does ship with 11i. • Automatically mount filesystems when the directory path is referenced • Automatically unmount filesystems that are not in use • Maps can be distributed via a directory server (i.e.
How are Automount and AutoFS different from each other? (part 1) July 22, 2002 Notes: automount & autofs Automount AutoFS supports NFS PV2/UDP only supports NFS PV3, TCP, CDFS, CacheFS single threaded multi-threaded (in certain places) pseudo NFS server legitimate filesystem uses symbolic links to redirect pathname lookup requests to real NFS mountpoints mounts NFS filesystems in-place Copyright 2002 Hewlett-Packard Company Page 68 The original automounter is only capable of managing NFS pro
How are Automount and AutoFS different from each other? (part 2) Automount July 22, 2002 Notes: automount & autofs AutoFS adding mount points to master or direct maps requires automount be killed and restarted to take effect maps changes take effect immediately whenever the /usr/sbin/automount command is issued doesn’t keep track of which filesystems are in use – issues unnecessary unmount requests keeps reference timer (for direct maps only) – avoids attempting to unmount busy filesystems kill –9
How are Automount and AutoFS different from each other? (part 3) automount & autofs • ServiceGuard Issue Ø NFS server is part of an HA/NFS (i.e.
Automounter Performance Considerations (part 1) automount & autofs • Default unmount timer and its effect on client caching Ø Any buffer cache or page cache memory pages associated with the filesystem are invalidated during an unmount attempt – even if the unmount fails because the filesystem is busy (i.e.
Automounter Performance Considerations (part 2) automount & autofs • NFS mount options used in master or subordinate maps Ø Mount options specified in a master map affects all entries of a subordinate map unless specifically overridden by the map entry Ø Options such as “noac” or “timeo” can have a dramatic impact on application performance Recommendation Ø Search all automount maps (local, NIS, NIS+) looking for NFS mount options and verify the application’s need for using them Ø Avoid the use of “noac”
Automounter Performance Considerations (part 3) automount & autofs • Replicated NFS Servers in Maps Ø Ensure the specified servers exist, respond quickly to mount requests, and contain the filesystem referenced in the map • Environment Variables in Maps (“-D” option) Ø Ensure the pathnames resolved by variables exist on the server • Hierarchical Maps Ø Entire hierarchy must be mounted/unmounted together Ø Adds overhead both to client’s automounter and server’s rpc.
Which Automounter makes sense for your environment? Automount July 22, 2002 Notes: automount & autofs AutoFS NFS PV2 only NFS PV2 or PV3 UDP transport only UDP or TCP transports Cannot manage CacheFS Can manage CacheFS mounts Safe for use with HA/NFS (i.e.
Troubleshooting Automount & AutoFS automount & autofs • Collect a debug Automount or AutoFS logfile • Verify that hostname resolution servers (i.e. DNS, NIS, etc.
• What is it? • How does it work? • What are its limitations? cachefs • Caching Application Binaries • CacheFS Performance Considerations • Should you use CacheFS? • Measuring Effectiveness July 22, 2002 Notes: Copyright 2002 Hewlett-Packard Company Page 76 This section begins by describing how CacheFS works and the potential benefits it can provide. This is followed by a list of the many limitations of CacheFS, and why it may not provide the promised benefits in every environment.
What is CacheFS? (part 1) cachefs Difference between 11.0 and 11i CacheFS is not available on 11.
What is CacheFS? (part 2) cachefs •Designed to be used for stable, read-only data •Since the cache resides in a local filesystem, the data can survive an unmount or a reboot •A single cache directory can be used to cache data from multiple NFS mount points •An LRU (least recently used) algorithm is used to remove data from the cache when the configured disk space or inode thresholds are reached July 22, 2002 Notes: Copyright 2002 Hewlett-Packard Company Page 78 CacheFS can supply a performance boost t
How does CacheFS work? cachefs • The cfsadmin(1M) command creates a cache on the client in a local filesystem (referred to as the front filesystem) • An NFS filesystem (referred to as the back filesystem) is mounted referencing the cache directory • During an NFS read the “front” filesystem is checked. If the data is resident the request is resolved locally.
CacheFS Limitations (part 1) cachefs • Only READ data is cached Ø Writing to a cached file invalidates the cached copy • Only NFS filesystems may be cached Ø Cannot cache other filesystem types such as CDFS • “Loose” synchronization with the “back” filesystem Ø Changes made to the NFS server take time to propagate • Dependent upon local filesystem performance Ø If NFS client disks are slow then performance will suffer July 22, 2002 Notes: Copyright 2002 Hewlett-Packard Company Page 80 CacheFS does
CacheFS Limitations (part 2) cachefs • Only certain files survive a CacheFS unmount or reboot • Any file marked “un-cacheable” will be removed Ø Writing to a cached file marks the file “un-cacheable” Ø When a cache reaches its configured disk space or inode usage thresholds, the LRU algorithm will select files to remove. Ø Every cached file is represented by a 32-slot allocation map data structure, where each slot represents a non-contiguous chunk of the file.
Application Binary Caching Dilemma cachefs • Most CacheFS customers want to use CacheFS to distribute NFS applications to their clients • In order to remain cached, a cached file must be loaded in 32 or fewer non-contiguous chunks • The UNIX loader usually loads application binaries in more than 32 non-contiguous chunks because of demand paging, which means that CacheFS is usually ineffective at fronting application binaries July 22, 2002 Notes: Copyright 2002 Hewlett-Packard Company Page 82 Since Ca
Application Binary Caching Solutions (part 1) cachefs • cat(1) Solution Ø cat(1) the application binary across the CacheFS filesystem and write it to /dev/null or /dev/zero Ø Ex: cat /opt/netscape/netscape > /dev/null Where “/opt/netscape” is the CacheFS-mounted filesystem Ø Forces CacheFS to read the entire binary file in a single contiguous chunk Ø Once the cache is populated, the binary will remain cached following unmounts and reboots, and all requests for this file will be satisfied from the cache Ø
Application Binary Caching Solutions (part 2) cachefs •HP-specific Solution – the rpages Mount Option Ø Instructs the kernel loader to load entire application binaries contiguously Ø Automatic – no further configuration or user intervention required Ø Only affects binaries – normal data files are not read in their entirety, only binaries that are executed are fully populated Ø Causes potentially slower initial load time, but substantially faster subsequent load times July 22, 2002 Notes: Copyright 2002
CacheFS Performance Considerations cachefs •Create separate caches for each NFS filesystem Ø Pools of cachefsd threads are created on a per-cache basis •Use dedicated front filesystems for each cache Ø Avoids having the LRU algorithm remove cached files because of non-CacheFS filesystem usage •Use the rpages mount option when appropriate Ø Dramatic performance increase for NFS-based applications • The maxcnodes kernel parameter Ø Determines the size of the CacheFS-specific inode table July 22, 2002 No
Should you use CacheFS? cachefs • Is your data “stable” and read-only? • Do you have spare local disk resources on your clients? • If you use CacheFS to front NFS applications, do your binaries remain in the cache following an unmount? If not, are you willing to force them to remain cached? WARNING – Patch CacheFS Prior to Using Ø Just prior to 11i releasing, several critical and serious CacheFS defects were discovered. All known CacheFS defects have since been fixed.
Measuring CacheFS Effectiveness (part 1) cachefs • Use cachefsstat(1M) command Ø Monitor cache hit rate over time • Compare wall-clock time with and without CacheFS Ø The timex(1) command reports on wall-clock times • Use nfsstat(1M) to monitor NFS READ calls • Be sure to unmount the CacheFS filesystem between application runs to nullify any effects from buffer cache and page cache July 22, 2002 Notes: Copyright 2002 Hewlett-Packard Company Page 87 There are several methodologies for determining whe
Measuring CacheFS Effectiveness (part 2) cachefs • Examine the contents of the cache via “ls -l” before and after unmounting the CacheFS filesystem July 22, 2002 Notes: Copyright 2002 Hewlett-Packard Company Page 88 Since the CacheFS cache resides in a local filesystem on the NFS client, the actual contents of the cache can be viewed like any other directory – via the ls(1) command.
• What are the differences between NFS PV2 and PV3? • Will a PV3 client/server always outperform a PV2 client/server? nfs protocol version 2 vs. nfs protocol version 3 • Should you use NFS PV2 or NFS PV3 in your environment? July 22, 2002 Notes: Page 89 Copyright 2002 Hewlett-Packard Company NFS, like most protocols, continues to evolve over time. The original version of NFS (Protocol Version 1) existed only within Sun Microsystems and was never released to the public.
How is NFS PV3 different from PV2? nfs pv2 vs. nfs pv3 ISSUE NFS PV2 NFS PV3 Maximum File Size 2GB 11.
How is 11i PV3 different from 11.0? nfs pv2 vs. nfs pv3 Difference between 11.0 and 11i Ø The largest file size supported by an 11.0 NFS PV3 client is 1TB. 11i PV3 clients can access files as large as 2TB. Ø When HP-UX 11.0 released, the largest read and write buffer size available for an NFS PV3 mountpoint was 8KB. Support for 32KB read and write requests were added in March 2000. 11i ships with support for 32KB read and write requests. Ø Even though HP-UX 11.
Will a PV3 implementation always outperform PV2? nfs pv2 vs.
Can you disable READDIRPLUS on an HP-UX 11.0 or 11i NFS Server? nfs pv2 vs. nfs pv3 WARNING WARNING WARNING Ø The following procedure is NOT SUPPORTED BY HP Ø This procedure should be used with caution, as it will disable the READDIRPLUS operation on the server globally, thus impacting any PV3 client – not just HP clients.
Which protocol should you use? nfs pv2 vs. nfs pv3 • In most environments, PV3 provides superior performance • PV2’s edge in asynchronous write performance is usually offset by the larger packet sizes afforded by PV3 • If the network is dropping large PV3 requests, the request size can be reduced via the rsize and wsize mount options. Alternately, NFS/TCP can be used to reduce the amount of data sent during a retransmission.
• Protocol-Induced Overhead • Retransmissions and Timeouts • Network Switch Buffering Considerations nfs/udp vs. nfs/tcp • Should you use NFS/UDP or NFS/TCP in your environment? July 22, 2002 Notes: Copyright 2002 Hewlett-Packard Company Page 95 One of the design goals of the original version of NFS, and every version since, has been to minimize the amount of network latency involved in accessing remote files.
How is 11.0 NFS/TCP support different from 11.i? nfs/udp vs. nfs/tcp Difference between 11.0 and 11i Ø When HP-UX 11.0 released, the only network transport available to NFS was UDP. NFS/TCP support was added in March 2000. Ø Even when the March 2000 patches are installed on HP-UX 11.0 systems, UDP remains the default protocol used by NFS. NFS/TCP support must be manually enabled via the new setoncenv(1M) command. Once NFS/TCP support has been enabled, TCP becomes the default protocol used for NFS.
Protocol-Induced Overhead nfs/udp vs. nfs/tcp • UDP Ø Lightweight, Connectionless, Unreliable • TCP Ø Connection Oriented, Reliable Delivery of Data Ø Connection Management (establishment & teardown) Ø Sequence and Acknowledgement Generation Ø Congestion Control, Window Scaling Ø Timeout and Retransmission Management July 22, 2002 Notes: Copyright 2002 Hewlett-Packard Company Page 97 UDP is commonly described as a connectionless or unreliable transport protocol.
Retransmissions and Timeouts ISSUE UDP Managing Timeouts and Retransmissions NFS manages Transport manages 1 st NFS manages 2nd How much DATA is sent in retransmission RSIZE/WSIZE (as much as 32KB) MTU Size (typically 1500 Bytes) Default Timeouts Min = calculated Max = 20 seconds Min = calculated Max = 60 seconds “timeo” Mount Option Effectively Ignored (HP behaves the same as SUN) Overrides Van Jacobsen Algorithm (avoid if possible) July 22, 2002 Notes: nfs/udp vs.
Why would an NFS client need to retransmit a request? nfs/udp vs. nfs/tcp • The client is unable to send the request (i.e. resource exhausted) • The request is dropped on the network before arriving on the server • The server is down or a network partition has occurred • The request arrives on the server but the server’s socket is full • The server receives the request but cannot process it in time • The server is unable to reply to the client (i.e.
Network Switch Buffering Issues nfs/udp vs. nfs/tcp Customer Reported Problem • High numbers of NFS/UDP retransmissions and timeouts • UDP packets were being dropped by the network switch • The same switch was NOT discarding TCP packets Results of Investigation The network hardware vendor confirmed that they dedicate 75% of the buffer memory in their switch for TCP/IP traffic and only 25% for UDP traffic. This gives NFS/TCP an advantage, albeit hardware-induced.
Which protocol should you use? Local Area Network with a SMALL Number of Retransmissions and Timeouts UDP Local Area Network with a HIGH Number of Retransmissions and Timeouts TCP High Latency Links or Wide Area Networks TCP Local Area Network with Network Switch UDP Buffers Overflowing TCP July 22, 2002 Notes: nfs/udp vs. nfs/tcp Copyright 2002 Hewlett-Packard Company Page 101 Traditionally the decision to use UDP or TCP was based solely on geography (i.e. LAN=UDP, WAN=TCP).
• Which NFS mount options directly affect performance? nfs mount options • Which options have different default values on HP-UX 11.0 and 11i? • How can you verify which mount options are in effect on a per-mountpoint basis? July 22, 2002 Notes: Copyright 2002 Hewlett-Packard Company Page 102 There are many NFS-specific mount options available. Some of these options can have a positive impact on performance, while others can have a dramatically negative effect.
Which NFS mount options directly affect performance? Option Recommendation vers= Version of the NFS protocol to use rsize= Size of the READ requests 32768 wsize= Size of the WRITE requests 32768 proto= Network transport protocol to use timeo= Duration of time to wait for an NFS request to complete before retransmitting (Refer to page 98 for more information) Disable client-side caching of file and directory attributes Use only when required by an application noac July 22, 2002 Notes: Desc
Which NFS mount options have different default values at 11i? July 22, 2002 Notes: nfs mount options Option 11.0 Default 11i Default rsize 8192 32768 wsize 8192 32768 proto UDP TCP Copyright 2002 Hewlett-Packard Company Page 104 The default values of many NFS mount options have changed in 11i. It is therefore important to understand which options have changed to know how a default NFS mount (i.e. a mount where no options are specified) will behave on both 11.0 and 11i clients.
How can you verify which NFS mount options are being used? July 22, 2002 Notes: Copyright 2002 Hewlett-Packard Company nfs mount options Page 105 The easiest and most accurate way to determine which NFS mount options are in effect on a per-mountpoint basis is to use the “nfsstat -m” command. Looking at the above screenshot, we can determine several things about the way this client has mounted its filesystems.
• What is buffer cache and why do you want to use it? • Why not just use lots of memory for buffer cache? buffer cache considerations • Static Allocation vs.
What is buffer cache memory? buffer cache • Portion of physical memory dedicated to storing file data • NFS read performance is increased when requested data is present in the cache, avoiding physical disk read • NFS write performance is increased by allowing writing process to post data to cache instead of to server’s disk • HP-UX uses a split memory cache system, employing both a buffer cache (used for storing data) and a page cache (used for storing executables, libraries, mmap files) July 22, 2002 N
Why not just configure lots of memory for buffer cache? buffer cache • A large cache does not guarantee a high cache hit-rate • Memory wasted that could be better used by the system • 11.0 client performance suffers using a large cache Difference between 11.0 and 11i The buffer cache management routines have been enhanced in 11i to track the buffer cache pages on a per-file basis.
Static vs.
Should you use static or dynamic allocation in your environment? You have determined the optimal cache size and have sufficient memory Static You plan on adding more memory to the system and don’t want buffer cache affected Static You use variable memory page sizes and experience memory fragmentation Static Memory pressure or small memory system Dynamic None of the above Dynamic July 22, 2002 Notes: buffer cache Page 110 Copyright 2002 Hewlett-Packard Company Under most circumstances, the dyn
Server’s interaction with syncer(1M) buffer cache • syncer(1M) is responsible for keeping the on-disk file system information synchronized with the contents of the buffer cache • It divides buffer cache into 5 “regions” and awakens every 6 seconds (by default) to scan one of the memory regions (i.e.
How much memory should you configure for buffer cache? buffer cache • Sizing too small on clients and servers can result in suboptimal performance • Sizing too large on 11.
Measuring Buffer Cache Utilization July 22, 2002 Notes: buffer cache Page 113 Copyright 2002 Hewlett-Packard Company After configuring your system with reasonable sized buffer cache, the next step is to run your applications and evaluate the performance with these buffer cache settings. You can either time the performance of your application using the time(1) or timex(1) commands, or simply use a stopwatch.
• Which kernel parameters directly affect NFS performance? • Inspecting kernel parameters kernel parameter tuning • Monitoring kernel parameter usage July 22, 2002 Notes: Copyright 2002 Hewlett-Packard Company Page 114 Since NFS spends the majority of its time running in the kernel, it should come as no surprise that there are many kernel parameters that can positively or negatively impact NFS performance.
Kernel parameters that directly affect NFS performance kernel parameter tuning • bufcache_hash_locks • max_fcp_reqs • nflocks • bufpages • max_thread_proc • ninode • create_fastlinks • maxfiles • nkthread • dbc_min_pct • maxfiles_lim • nproc • dbc_max_pct • maxswapchunks • scsi_max_qdepth • default_disk_ir • nbuf • vnode_hash_locks • dnlc_hash_locks • ncallout • vnode_cd_hash_locks • fs_async • ncsize • vx_fancyra_enable • ftable_hash_locks • nfile • vx_ninode July 22, 2002 Not
Kernel Parameter Recommendations (part 1) Variable Description Def Recommend bufcache_hash_locks The size of the pool of locks used to control access to buffer cache data structures 128 4096 bufpages Number of 4K memory pages in static buffer cache 0 0 (dynamic) create_fastlinks Enable/disable storing link text for symlinks in disk inode – HFS only 0 1 (enable) dbc_min_pct Min. % of memory used for dynamic buffer cache 5 5 dbc_max_pct Max.
Kernel Parameter Recommendations (part 2) Variable Description Def Recommend dnlc_hash_locks Size of the pool of locks used to control access to DNLC structures, and the number of hash chains the DNLC entries are divided into 64 (11.
Kernel Parameter Recommendations (part 3) Variable Description kernel parameter tuning Def Recommend max_thread_proc Max. number of kernel threads that can be associated with a process 64 256 maxfiles Specifies the “soft” limit for the number of files that a given process can have open at any time 60 1024 maxfiles_lim Specifies the “hard” limit for the number of files that a given process can have open at any time 1024 2048 WARNING WARNING WARNING Ø HP-UX 11.
Kernel Parameter Recommendations (part 4) Variable Description Def Recommend 256 8192 0 0 (dynamic) maxswapchunks Maximum amount of swap space that can be configured nbuf Defines the number of buffer headers to be allocated for the static-sized buffer cache ncallout Maximum number of timeouts that can be scheduled by the kernel 292 2064 ncsize Directly sizes the DNLC and the NFS client’s rnode cache 476 8192 nfile Maximum number of open files allowed on the system at any time 928 8192
Kernel Parameter Recommendations (part 5) Variable Description Def Recommend ninode Directly sizes the HFS inode cache, indirectly sizes CacheFS maxcnodes, can indirectly size the DNLC, NFS rnode cache, and can size VxFS inode cache 476 8192 nkthread Maximum number of kernel threads that can be running on the system at any time 499 2048 nproc Maximum number of processes that can be running on the system at any time 276 1024 scsi_max_qdepth The maximum number of I/O requests that can be queu
Kernel Parameter Recommendations (part 6) Variable kernel parameter tuning Description Def Recommend vnode_hash_locks Sizes the pool of locks used to control access to vnode data structures 128 4096 vnode_cd_hash_locks Sizes the pool of locks used to control access to the clean and dirty buffer chains associated with the vnode structures 128 4096 0 1 (enable) 0 8192 vx_fancyra_enable Enable or disable intelligent read-ahead algorithm. VxFS 3.3 filesystems only.
Inspecting Kernel Parameters kernel parameter tuning • sam(1M) – Kernel Configuration Screen • kmtune(1M) • sysdef(1M) • /stand/system • adb(1) July 22, 2002 Notes: Copyright 2002 Hewlett-Packard Company Page 122 There are several tools available for determining the current size of the various kernel parameters on the system. Some of these tools can also describe how the sizes of these parameters are calculated in the kernel.
Measuring Kernel Parameter Usage July 22, 2002 Notes: Copyright 2002 Hewlett-Packard Company kernel parameter tuning Page 123 The GlancePlus “System Tables Report” screen displays several critical kernel parameters, along with their current utilization rate. In the above screenshot we can see the number of proc table entries in use, the amount of the file table consumed, the current rate of buffer cache usage, etc.
• Default number of biod daemons (11.0 = 4 11i = 16) summary of nfs differences between hp-ux 11.0 and 11i • Default number of nfsd daemons (11.0 = 4 11i = 16) • Support for AutoFS (11.0 – patch 11i – included) • Support for NFS/TCP (11.0 – patch 11i – included) • Filesystem Semaphore Contention drastically reduced in 11i • Default “proto” NFS mount option (11.0 = UDP 11i = TCP) • Support for large NFS files (11.0 – 1TB 11i – 2TB) • Default “rsize” NFS mount option (11.
Sanity Check your NFS Environment • Verify Network Performance summary of recommendations • Verify Local Filesystems Performance • Keep Current on Patches • Verify Hostname Resolution Speed and Accuracy of Data • Number of daemons and threads • Automounter command-line options • Will CacheFS benefit you? • When to use PV2 vs. PV3 • When to use UDP vs.
For More Information Optimizing NFS Performance Tuning and Troubleshooting NFS on HP-UX Systems by Dave Olker • Publisher: Prentice Hall PTR • ISBN 0130428167 • Arriving in bookstores September 2002 July 22, 2002 Notes: Copyright 2002 Hewlett-Packard Company Page 126 The Optimizing NFS Performance book contains everything in this presentation and a whole lot more. nfs performance tuning for hp-ux 11.
Electronic Versions of this Presentation are Available at the following Locations •Internal HP – SNSL Lab DMS Ø http://snslweb.cup.hp.com/getfile.php?id=205 •External – hp technical documentation Ø http://docs.hp.com/hpux/onlinedocs/netcom/NFS_perf_tuning_hpux110_11i.pdf •External – developer & solutions partner portal Ø http://hp.
July 22, 2002 Copyright 2002 Hewlett-Packard Company Page 128 Notes: nfs performance tuning for hp-ux 11.