File System Tuning Guide File System Tuning Guide File System Tuning Guide StorNext 3.
Document Title, 6-01376-05, Ver. A, Rel. 3.0, March 2007, Made in USA. Quantum Corporation provides this publication “as is” without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability or fitness for a particular purpose. Quantum Corporation may revise this publication from time to time without notice. COPYRIGHT STATEMENT Copyright 2007 by Quantum Corporation. All rights reserved.
Contents StorNext File System Tuning 1 The Underlying Storage System ...................................................................... 1 RAID Cache Configuration ....................................................................... 2 RAID Write-Back Caching ........................................................................ 2 RAID Read-Ahead Caching ...................................................................... 3 RAID Level, Segment Size, and Stripe Size ..............................
0 StorNext File System Tuning The StorNext File System (SNFS) provides extremely high performance for widely varying scenarios. Many factors determine the level of performance you will realize. In particular, the performance characteristics of the underlying storage system are the most critical factors. However, other components such as the Metadata Network and MDC systems also have a significant effect on performance.
StorNext File System Tuning The Underlying Storage System RAID Cache Configuration 0 The single most important RAID tuning component is the cache configuration. This is particularly true for small I/O operations. Contemporary RAID systems such as the EMC CX series and the various Engenio systems provide excellent small I/O performance with properly tuned caching. So, for the best general purpose performance characteristics, it is crucial to utilize the RAID system caching as fully as possible.
StorNext File System Tuning The Underlying Storage System metadata operations throughput. This is easily observed in the hourly File System Manager (FSM) statistics reports in the cvlog file. For example, here is a message line from the cvlog file: PIO HiPriWr SUMMARY SnmsMetaDisk0 sysavg/350 sysmin/333 sysmax/367 This statistics message reports average, minimum, and maximum write latency (in microseconds) for the reporting period.
StorNext File System Tuning The Underlying Storage System it severely degrades typical scenarios. Therefore, it is unsuitable for most environments. RAID Level, Segment Size, and Stripe Size 0 Configuration settings such as RAID level, segment size, and stripe size are very important and cannot be changed after put into production, so it is critical to determine appropriate settings during initial configuration. The best RAID level to use for high I/O throughput is usually RAID5.
StorNext File System Tuning File Size Mix and Application I/O Characteristics File Size Mix and Application I/O Characteristics It is always valuable to understand the file size mix of the target dataset as well as the application I/O characteristics. This includes the number of concurrent streams, proportion of read versus write streams, I/O size, sequential versus random, Network File System (NFS) or Common Internet File System (CIFS) access, and so on.
StorNext File System Tuning File Size Mix and Application I/O Characteristics Command Options on page 16). So, it is typically most important to optimize the RAID cache configuration settings described earlier in this document. It is usually best to configure the RAID stripe size no greater than 256K for optimal small file buffer cache performance. For more buffer cache configuration settings, see Mount Command Options on page 16.
StorNext File System Tuning The Metadata Network The Metadata Network As with any client/server protocol, SNFS performance is subject to the limitations of the underlying network. Therefore, it is recommended that you use a dedicated Metadata Network to avoid contention with other network traffic. Either 100BaseT or 1000BaseT is required, but for a dedicated Metadata Network there is usually no benefit from using 1000BaseT over 100BaseT. Neither TCP offload nor are jumbo frames required.
StorNext File System Tuning The Metadata Controller System Some metadata operations such as file creation can be CPU intensive, and benefit from increased CPU power. The MDC platform is important in these scenarios because lower clock- speed CPUs such as Sparc and Mips degrade performance. Other operations can benefit greatly from increased memory, such as directory traversal.
StorNext File System Tuning The Metadata Controller System Example: [stripeGroup RegularFiles] Status UP Exclusive No ##Non-Exclusive stripeGroup for all Files## Read Enabled Write Enabled StripeBreadth 256K MultiPathMethod Rotate Node CvfsDisk6 0 Node CvfsDisk7 1 0 Affinities Affinities are another stripe group feature that can be very beneficial. Affinities can direct file allocation to appropriate stripe groups according to performance requirements.
StorNext File System Tuning The Metadata Controller System 0 StripeBreadth This setting must match the RAID stripe size or be a multiple of the RAID stripe size. Matching the RAID stripe size is usually the most optimal setting. However, depending on the RAID performance characteristics and application I/O size, it might be beneficial to use a multiple of the RAID stripe size.
StorNext File System Tuning The Metadata Controller System 0 InodeCacheSize This setting consumes about 800-1000 bytes of memory times the number specified. Increasing this value can reduce latency of any metadata operation by performing a hot cache access to inode information instead of an I/O to get inode info from disk, about 100 to 1000 times faster. It is especially important to increase this setting if metadata I/O latency is high, (for example, more than 2ms average latency).
StorNext File System Tuning The Metadata Controller System severely consumes metadata space in cases where the file-to-directory ratio is less than 100 to 1. However, startup and failover time can be minimized by increasing FsBlockSize. This is very important for multiterabyte file systems, and especially when the metadata servers have slow CPU clock speed (such as Sparc and Mips). A good rule of thumb is to use 16K unless other requirements such as directory ratio dictate otherwise.
StorNext File System Tuning The Metadata Controller System It also possible to trigger an instant FSM statistics report by setting the Once Only debug flag using cvadmin. For example: cvadmin -F snfs1 -e ‘debug 0x01000000’ ; tail -100 /usr/cvfs/data/snfs1/log/cvlog The following items are a few things to watch out for: • A non-zero value for FSM wait SUMMARY journal waits indicates insufficient IOPS performance of the disks assigned to the metadata stripe group.
StorNext File System Tuning The Metadata Controller System The cvcp utility is a higher performance alternative to commands such as cp and tar. The cvcp utility achieves high performance by using threads, large I/O buffers, preallocation, stripe alignment, DMA I/O transfer, and Bulk Create. Also, the cvcp utility uses the SNFS External API for preallocation and stripe alignment.
StorNext File System Tuning The Metadata Controller System • Zr: Hole in file was zeroed Both traces also report file offset, I/O size, latency (mics), and inode number. Sample use cases: • Verify that I/O properties are as expected. You can use the VFS trace to ensure that the displayed properties are consistent with expectations, such as being well formed; buffered versus DMA; shared/non-shared; or I/O size. If a small I/O is being performed DMA, performance will be poor.
StorNext File System Tuning The Metadata Controller System The latency-test command has the following syntax: latency-test index-number [seconds] latency-test all [seconds] If an index-number is specified, the test is run between the currentlyselected FSM and the specified client. (Client index numbers are displayed by the cvadmin who command). If all is specified, the test is run against each client in turn. The test is run for 2 seconds, unless a value for seconds is specified.
StorNext File System Tuning The Metadata Controller System The buffer cache I/O size is adjusted using the cachebufsize setting. The default setting is usually optimal; however, sometimes performance can be improved by increasing this setting to match the RAID5 stripe size. Unfortunately, this is often not possible on Linux due to kernel memory fragmentation. In this case performance may degrade severely because the full amount of buffer cache cannot be allocated.
StorNext File System Tuning The Distributed LAN (Disk Proxy) Networks SNFS External API 0 The SNFS External API might be useful in some scenarios because it offers programmatic use of special SNFS performance capabilities such as affinities, preallocation, and quality of service. For more information, see the Quality of Service chapter of the StorNext User’s Guide API Guide.
StorNext File System Tuning The Distributed LAN (Disk Proxy) Networks Within each Distributed LAN network, it is best practice to have all SNFS Distributed LAN clients and servers directly attached to the same network switch. A router between a Distributed LAN client and server could be easily overwhelmed by the data rates required. It is critical to ensure that speed/duplex settings are correct, as this will severely impact performance. Most of the time auto-detect is the correct setting.
StorNext File System Tuning The Distributed LAN (Disk Proxy) Networks Network Configuration and Topology 0 A common source of difficult-to-diagnose issues with SNFS is improper IP network configuration. Many incorrect IP configurations might appear to work when tested with particular applications or particular kinds of hosts, but fail when used with SNFS or when a different kind of host is added to the cluster.
StorNext File System Tuning The Distributed LAN (Disk Proxy) Networks Figure 1 Multi-NIC Hardware and IP Configuration Diagram 10.0.0.42 Other Host 10.0.0.1 SNFS MDC 172.16.47.1 10.0.0.12 172.16.47.12 Public Net Switch 10.0.0.27 172.16.47.27 SNFS SAN Client Distributed LAN Server A SAN 192.168.1.27 192.168.2.27 MetaData Switch 10.0.0.28 172.16.47.28 Distributed LAN Server B 192.168.1.28 DistLAN 1 Switch 192.168.2.8 10.0.0.91 172.16.47.91 DistLAN 2 Switch Distributed LAN Client X 192.
StorNext File System Tuning Distributed LAN Servers Distributed LAN Servers Distributed LAN Servers must have sufficient memory. When a Distributed LAN Server does not have sufficient memory, its performance in servicing Distributed LAN I/O requests might suffer. In some cases (particularly on Windows,) it might hang. Refer to the StorNext Release Notes for this release’s memory requirements. Distributed LAN Servers must also have sufficient bus bandwidth.
StorNext File System Tuning Windows Memory Requirements mysteriously dying, repeated FSM reconnect attempts, and messages being sent to the application log and cvlog.txt about socket failures with the status code (10555) which is ENOBUFS. The solution is to adjust a few parameters on the Cache Parameters tab in the SNFS control panel (cvntclnt). These parameters control how much memory is consumed by the directory cache, the buffer cache, and the local file cache.
StorNext File System Tuning Windows Memory Requirements water mark is reached, for example 128. The Max water mark is for situations where memory is very tight. The normal purge algorithms takes access time into account when determining a candidate to evict from the cache; in tight memory situations (when there are more than 'max' entries in the cache), these constraints are relaxed so that memory can be released. A value of 1024 in a tight memory situation should work.
StorNext File System Tuning Sample FSM Configuration File Sample FSM Configuration File This sample configuration file is located in the SNFS install directory under the examples subdirectory named example.cfg. # # # # # # # # # # **************************************************************************** A global section for defining file system-wide parameters.
StorNext File System Tuning Sample FSM Configuration File MAX(StripeBreadth) # MaxMBPerClientReserve 50 # in MBs, default 100 MB reserved per client # OpHangLimitSecs 300 # default 180 secs # DataMigrationThreadPoolSize 128 # Managed only, default 8 # **************************************************************************** # A disktype section for defining disk hardware parameters.
StorNext File System Tuning Sample FSM Configuration File [Disk CvfsDisk4] Status UP Type VideoDrive [Disk CvfsDisk5] Status UP Type VideoDrive [Disk CvfsDisk6] Status UP Type VideoDrive [Disk CvfsDisk7] Status UP Type VideoDrive [Disk CvfsDisk8] Status UP Type VideoDrive [Disk CvfsDisk9] Status UP Type VideoDrive [Disk CvfsDisk10] Status UP Type AudioDrive [Disk CvfsDisk11] Status UP Type AudioDrive [Disk CvfsDisk12] Status UP Type AudioDrive [Disk CvfsDisk13] Status UP Type AudioDrive [Disk CvfsDisk14] S
StorNext File System Tuning Sample FSM Configuration File [Disk CvfsDisk17] Status UP Type DataDrive # **************************************************************************** # A stripe section for defining stripe groups.
StorNext File System Tuning Sample FSM Configuration File [StripeGroup AudioFiles] Status UP Exclusive Yes ##Exclusive StripeGroup for Audio File Only## Affinity AudioFiles Read Enabled Write Enabled StripeBreadth 1M MultiPathMethod Rotate Node CvfsDisk10 0 Node CvfsDisk11 1 Node CvfsDisk12 2 Node CvfsDisk13 3 [StripeGroup RegularFiles] Status UP Exclusive No ##Non-Exclusive StripeGroup for all Files## Read Enabled Write Enabled StripeBreadth 256K MultiPathMethod Rotate Node CvfsDisk14 0 Node CvfsDisk15 1