File System Tuning Guide File System Tuning Guide File System Tuning Guide StorNext 3.
StorNext 3.1 File System Tuning Guide, 6-01376-07, Ver. A, Rel. 3.1, October 2007, Made in USA. Quantum Corporation provides this publication “as is” without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability or fitness for a particular purpose. Quantum Corporation may revise this publication from time to time without notice. COPYRIGHT STATEMENT Copyright 2007 by Quantum Corporation. All rights reserved.
Contents Chapter 1 StorNext File System Tuning 1 The Underlying Storage System ...................................................................... 1 RAID Cache Configuration ....................................................................... 2 RAID Write-Back Caching ........................................................................ 2 RAID Read-Ahead Caching ...................................................................... 3 RAID Level, Segment Size, and Stripe Size ...................
0 StorNext File System Tuning The StorNext File System (SNFS) provides extremely high performance for widely varying scenarios. Many factors determine the level of performance you will realize. In particular, the performance characteristics of the underlying storage system are the most critical factors. However, other components such as the Metadata Network and MDC systems also have a significant effect on performance.
StorNext File System Tuning The Underlying Storage System RAID Cache Configuration 0 The single most important RAID tuning component is the cache configuration. This is particularly true for small I/O operations. Contemporary RAID systems such as the EMC CX series and the various Engenio systems provide excellent small I/O performance with properly tuned caching. So, for the best general purpose performance characteristics, it is crucial to utilize the RAID system caching as fully as possible.
StorNext File System Tuning The Underlying Storage System File System Manager (FSM) statistics reports in the cvlog file. For example, here is a message line from the cvlog file: PIO HiPriWr SUMMARY SnmsMetaDisk0 sysavg/350 sysmin/333 sysmax/367 This statistics message reports average, minimum, and maximum write latency (in microseconds) for the reporting period. If the observed average latency exceeds 500 microseconds, peak metadata operation throughput will be degraded.
StorNext File System Tuning The Underlying Storage System unusual cases, it severely degrades typical scenarios. Therefore, it is unsuitable for most environments. RAID Level, Segment Size, and Stripe Size 0 Configuration settings such as RAID level, segment size, and stripe size are very important and cannot be changed after put into production, so it is critical to determine appropriate settings during initial configuration. The best RAID level to use for high I/O throughput is usually RAID5.
StorNext File System Tuning File Size Mix and Application I/O Characteristics File Size Mix and Application I/O Characteristics It is always valuable to understand the file size mix of the target dataset as well as the application I/O characteristics. This includes the number of concurrent streams, proportion of read versus write streams, I/O size, sequential versus random, Network File System (NFS) or Common Internet File System (CIFS) access, and so on.
StorNext File System Tuning File Size Mix and Application I/O Characteristics It is usually best to configure the RAID stripe size no greater than 256K for optimal small file buffer cache performance. For more buffer cache configuration settings, see Mount Command Options on page 17. NFS / CIFS 0 It is best to isolate NFS and/or CIFS traffic off of the metadata network to eliminate contention that will impact performance. For optimal performance it is necessary to use 1000BaseT instead of 100BaseT.
StorNext File System Tuning The Metadata Network The Metadata Network As with any client/server protocol, SNFS performance is subject to the limitations of the underlying network. Therefore, it is recommended that you use a dedicated Metadata Network to avoid contention with other network traffic. Either 100BaseT or 1000BaseT is required, but for a dedicated Metadata Network there is usually no benefit from using 1000BaseT over 100BaseT. Neither TCP offload nor are jumbo frames required.
StorNext File System Tuning The Metadata Controller System Some metadata operations such as file creation can be CPU intensive, and benefit from increased CPU power. The MDC platform is important in these scenarios because lower clock- speed CPUs such as Sparc and Mips degrade performance. Other operations can benefit greatly from increased memory, such as directory traversal.
StorNext File System Tuning The Metadata Controller System Example: [stripeGroup RegularFiles] Status UP Exclusive No ##Non-Exclusive stripeGroup for all Files## Read Enabled Write Enabled StripeBreadth 256K MultiPathMethod Rotate Node CvfsDisk6 0 Node CvfsDisk7 1 0 Affinities Affinities are another stripe group feature that can be very beneficial. Affinities can direct file allocation to appropriate stripe groups according to performance requirements.
StorNext File System Tuning The Metadata Controller System 0 StripeBreadth This setting must match the RAID stripe size or be a multiple of the RAID stripe size. Matching the RAID stripe size is usually the most optimal setting. However, depending on the RAID performance characteristics and application I/O size, it might be beneficial to use a multiple of the RAID stripe size.
StorNext File System Tuning The Metadata Controller System 0 InodeCacheSize This setting consumes about 800-1000 bytes of memory times the number specified. Increasing this value can reduce latency of any metadata operation by performing a hot cache access to inode information instead of an I/O to get inode info from disk, about 100 to 1000 times faster. It is especially important to increase this setting if metadata I/O latency is high, (for example, more than 2ms average latency).
StorNext File System Tuning The Metadata Controller System Settings greater than 64K are not recommended because performance will be adversely impacted due to inefficient metadata I/O operations. Values less than 16K are not recommended in most scenarios because startup and failover time may be adversely impacted. Setting FsBlockSize to higher values is important for multiterabyte file systems for optimal startup and failover time.
StorNext File System Tuning The Metadata Controller System man cvfs_config page.) The snfsdefrag man page explains the command options in greater detail. FSM hourly statistics reporting is another very useful tool. This can show you the mix of metadata operations being invoked by client processes, as well as latency information for metadata operations and metadata and journal I/O. This information is easily accessed in the cvlog log files.
StorNext File System Tuning The Metadata Controller System SNFS supports the Windows Perfmon utility. This provides many useful statistics counters for the SNFS client component. To install, obtain a copy of cvfsperf.dll from the SCM team in Denver and copy it into the c:/ winnt/system32 directory on the SNFS client system. Then run rmperfreg.exe and instperfreg.exe to set up the required registry settings. After these steps, the SNFS counters should be visible to the Windows Perfmon utility.
StorNext File System Tuning The Metadata Controller System The “PERF: VFS” trace shows throughput measured for the read or write system call and significant aspects of the I/O, including: • Dma: DMA • Buf: Buffered • Eof: File extended • Algn: Well-formed DMA I/O • Shr: File is shared by another client • Rt: File is real time • Zr: Hole in file was zeroed Both traces also report file offset, I/O size, latency (mics), and inode number. Sample use cases: • Verify that I/O properties are as expected.
StorNext File System Tuning The Metadata Controller System • Identify read/modify/write condition. If buffered VFS writes are causing Device reads, it might be beneficial to match I/O request size to a multiple of the “cachebufsize” (default 64KB; see mount_cvfs man page). Another way to avoid this is by truncating the file before writing. The cvadmin command includes a latency-test utility for measuring the latency between an FSM and one or more SNFS clients.
StorNext File System Tuning The Metadata Controller System Mount Command Options 0 The following SNFS mount command settings are explained in greater detail in the mount_cvfs man page. The default size of the buffer cache varies by platform and main memory size, and ranges between 32MB and 256MB. And, by default, each buffer is 64K so the cache contains between 512 and 4096 buffers. In general, increasing the size of the buffer cache will not improve performance for streaming reads and writes.
StorNext File System Tuning The Distributed LAN (Disk Proxy) Networks SNFS External API 0 The SNFS External API might be useful in some scenarios because it offers programmatic use of special SNFS performance capabilities such as affinities, preallocation, and quality of service. For more information, see the Quality of Service chapter of the StorNext User’s Guide API Guide.
StorNext File System Tuning The Distributed LAN (Disk Proxy) Networks is set to auto-detect but the host is set to 1000Mb/full, you will observe a high error rate and extremely poor performance. On Linux the ethtool command can be very useful to investigate and adjust speed/duplex settings. In some cases, TCP offload seems to cause problems with Distributed LAN by miscalculating checksums under heavy loads. This is indicated by bad segments indicated in the output of netstat -s.
StorNext File System Tuning The Distributed LAN (Disk Proxy) Networks Figure 1 Multi-NIC Hardware and IP Configuration Diagram 10.0.0.35 Distributed 10.0.0.34 Distributed LAN 10.0.0.33 Distributed LAN Server 192.168.9.35 LAN Server 192.168.9.34 S1 Server 192.168.9.33 S1 S1 Switch A Switch B 10.0.0.x 192.168.9.x SAN 192.168.9.45 Distributed 192.168.9.44 Distributed LAN 192.168.9.43 Distributed LAN Client 10.0.0.45 LAN Client 10.0.0.44 C1 Client 10.0.0.43 C1 C1 10.0.0.57 10.0.0.56 10.0.0.
StorNext File System Tuning Distributed LAN Servers Distributed LAN Servers Distributed LAN Servers must have sufficient memory. When a Distributed LAN Server does not have sufficient memory, its performance in servicing Distributed LAN I/O requests might suffer. In some cases (particularly on Windows,) it might hang. Refer to the StorNext Release Notes for this release’s memory requirements. Distributed LAN Servers must also have sufficient bus bandwidth.
StorNext File System Tuning Distributed LAN Client Vs. Legacy Network Attached Storage Performance 0 DLC outperforms NFS and CIFS for single-stream I/O and provides higher aggregate bandwidth. For inferior NFS client implementations, the difference can be more than a factor of two. DLC also makes extremely efficient use of multiple NICs (even for single streams,) whereas legacy NAS protocols allow only a single NIC to be used.
StorNext File System Tuning Windows Memory Requirements Therefore, DLC provides increased stability that is comparable to the StorNext SAN Client. Consistent Security Model 0 DLC clients have the same security model as StorNext SAN clients. When CIFS and NFS are used, some security models aren’t supported. (For example, Windows ACLs are not accessible when running UNIX Samba servers.) Windows Memory Requirements Beginning in version 2.6.
StorNext File System Tuning Windows Memory Requirements The first is the Directory Cache Size. The default is 10 (MB). If you do not have large directories, or do not perform lots of directory scans, this number can be reduced to 1 or 2 MB. The impact will be slightly slower directory lookups in directories that are frequently accessed. Also, in the Mount Option panel, you should set the Paged DirCache option. The next parameters control how many file structures are cached on the client.
StorNext File System Tuning Sample FSM Configuration File Sample FSM Configuration File This sample configuration file is located in the SNFS install directory under the examples subdirectory named example.cfg. # # # # # # # # # # **************************************************************************** A global section for defining file system-wide parameters.
StorNext File System Tuning Sample FSM Configuration File MAX(StripeBreadth) # MaxMBPerClientReserve 50 # in MBs, default 100 MB reserved per client # OpHangLimitSecs 300 # default 180 secs # DataMigrationThreadPoolSize 128 # Managed only, default 8 # **************************************************************************** # A disktype section for defining disk hardware parameters.
StorNext File System Tuning Sample FSM Configuration File [Disk CvfsDisk4] Status UP Type VideoDrive [Disk CvfsDisk5] Status UP Type VideoDrive [Disk CvfsDisk6] Status UP Type VideoDrive [Disk CvfsDisk7] Status UP Type VideoDrive [Disk CvfsDisk8] Status UP Type VideoDrive [Disk CvfsDisk9] Status UP Type VideoDrive [Disk CvfsDisk10] Status UP Type AudioDrive [Disk CvfsDisk11] Status UP Type AudioDrive [Disk CvfsDisk12] Status UP Type AudioDrive [Disk CvfsDisk13] Status UP Type AudioDrive [Disk CvfsDisk14] S
StorNext File System Tuning Sample FSM Configuration File [Disk CvfsDisk17] Status UP Type DataDrive # **************************************************************************** # A stripe section for defining stripe groups.
StorNext File System Tuning Sample FSM Configuration File [StripeGroup AudioFiles] Status UP Exclusive Yes ##Exclusive StripeGroup for Audio File Only## Affinity AudioFiles Read Enabled Write Enabled StripeBreadth 1M MultiPathMethod Rotate Node CvfsDisk10 0 Node CvfsDisk11 1 Node CvfsDisk12 2 Node CvfsDisk13 3 [StripeGroup RegularFiles] Status UP Exclusive No ##Non-Exclusive StripeGroup for all Files## Read Enabled Write Enabled StripeBreadth 256K MultiPathMethod Rotate Node CvfsDisk14 0 Node CvfsDisk15 1