File System Tuning Guide File System Tuning Guide File System Tuning Guide StorNext 3.5.
StorNext 3.5.2 File System Tuning Guide, 6-01376-14, Ver. A, Rel. 3.5.2, February 2010, Made in USA. Quantum Corporation provides this publication “as is” without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability or fitness for a particular purpose. Quantum Corporation may revise this publication from time to time without notice. COPYRIGHT STATEMENT © Copyright 2000 - 2010 Quantum Corporation. All rights reserved.
Contents Chapter 0 StorNext File System Tuning 1 The Underlying Storage System ...................................................................... 1 RAID Cache Configuration ....................................................................... 2 RAID Write-Back Caching ........................................................................ 2 RAID Read-Ahead Caching ...................................................................... 3 RAID Level, Segment Size, and Stripe Size ...................
0 StorNext File System Tuning The StorNext File System (SNFS) provides extremely high performance for widely varying scenarios. Many factors determine the level of performance you will realize. In particular, the performance characteristics of the underlying storage system are the most critical factors. However, other components such as the Metadata Network and MDC systems also have a significant effect on performance.
StorNext File System Tuning The Underlying Storage System RAID Cache Configuration 0 The single most important RAID tuning component is the cache configuration. This is particularly true for small I/O operations. Contemporary RAID systems such as the EMC CX series and the various Engenio systems provide excellent small I/O performance with properly tuned caching. So, for the best general purpose performance characteristics, it is crucial to utilize the RAID system caching as fully as possible.
StorNext File System Tuning The Underlying Storage System operations involve a very high rate of small writes to the metadata disk, so disk latency is the critical performance factor. Write-back caching can be an effective approach to minimizing I/O latency and optimizing metadata operations throughput. This is easily observed in the hourly File System Manager (FSM) statistics reports in the cvlog file.
StorNext File System Tuning The Underlying Storage System While read-ahead caching improves sequential read performance, it does not help highly transactional performance. Furthermore, some SNFS customers actually observe maximum large sequential read throughput by disabling caching. While disabling read-ahead is beneficial in these unusual cases, it severely degrades typical scenarios. Therefore, it is unsuitable for most environments.
StorNext File System Tuning File Size Mix and Application I/O Characteristics example, varying the stripe size and running lmdd with a range of I/O sizes might be useful to determine an optimal stripe size multiple to configure the SNFS StripeBreadth. Some storage vendors now provide RAID6 capability for improved reliability over RAID5. This may be particularly valuable for SATA disks where bit error rates can lead to disk problems.
StorNext File System Tuning File Size Mix and Application I/O Characteristics Buffer Cache 0 Reads and writes that aren't well-formed utilize the SNFS buffer cache. This also includes NFS or CIFS-based traffic because the NFS and CIFS daemons defeat well-formed I/Os issued by the application. There are several configuration parameters that affect buffer cache performance.
StorNext File System Tuning SNFS and Virus Checking It can be useful to use a tool such as netperf to help verify network performance characteristics. SNFS and Virus Checking Virus-checking software can severely degrade the performance of any file system, including SNFS. If you have anti-virus software running on a Windows Server 2003 or Windows XP machine, Quantum recommends configuring the software so that it does NOT check SNFS.
StorNext File System Tuning The Metadata Controller System It can be useful to use a tool like netperf to help verify the Metadata Network performance characteristics. For example, if netperf -t TCP_RR reports less than 15,000 transactions per second capacity, a performance penalty may be incurred. You can also use the netstat tool to identify tcp retransmissions impacting performance. The cvadmin “latency-test” tool is also useful for measuring network latency.
StorNext File System Tuning The Metadata Controller System However, it is critical that the MDC system have enough physical memory available to ensure that the FSM process doesn’t get swapped out. Otherwise, severe performance degradation and system instability can result. The operating system on the metadata controller must always be run in U.S. English. FSM Configuration File Settings 0 The following FSM configuration file settings are explained in greater detail in the cvfs_config man page.
StorNext File System Tuning The Metadata Controller System MultiPathMethod Rotate Node CvfsDisk6 0 Node CvfsDisk7 1 [StripeGroup MetaFiles] Status UP MetaData Yes Journal No Exclusive Yes Read Enabled Write Enabled StripeBreadth 256K MultiPathMethod Rotate Node CvfsDisk0 0 [StripeGroup JournFiles] Status UP Journal Yes MetaData No Exclusive Yes Read Enabled Write Enabled StripeBreadth 256K MultiPathMethod Rotate Node CvfsDisk1 0 Affinities 0 Affinities are another stripe group feature that can be very
StorNext File System Tuning The Metadata Controller System For optimal performance, files that are accessed using large DMA-based I/O could be steered to wide-stripe stripe groups. Less performancecritical files could be steered to slow disk stripe groups. Small files could be steered to narrow-stripe stripe groups.
StorNext File System Tuning The Metadata Controller System Example: [stripeGroup VideoFiles] Status UP Exclusive Yes Affinity VidFiles ##These Two lines set Exclusive stripeGroup## ##for Video Files Only## Read Enabled Write Enabled StripeBreadth 4M MultiPathMethod Rotate Node CvfsDisk2 0 Node CvfsDisk3 1 0 BufferCacheSize This setting consumes up to 2X bytes of memory times the number specified.
StorNext File System Tuning The Metadata Controller System 0 ThreadPoolSize This setting consumes up to 512 KB memory times the number specified. Increasing this value can improve concurrency of metadata operations. For example, if many client processes are executing concurrently, the thread pool can become exhausted by I/O wait time. Increasing the thread pool size permits hot cache operations to be processed that would otherwise be backed up behind the I/O-bound operations.
StorNext File System Tuning The Metadata Controller System Note: This is particularly true for slow CPU clock speed metadata servers such as Sparc. However, values greater than 16K can severely consume metadata space in cases where the file-todirectory ratio is low (e.g. less than 100 to 1). For metadata disk size, you must have a minimum of 25 GB, with more space allocated depending on the number of files per directory and the size of your file system.
StorNext File System Tuning The Metadata Controller System 0 JournalSize The optimal settings for JournalSize are in the range between 16M and 64M, depending on the FsBlockSize. Avoid values greater than 64M due to potentially severe impacts on startup and failover times. Values at the higher end of the 16M-64M range may improve performance of metadata operations in some cases, although at the cost of slower startup and failover time. The following table shows recommended settings.
StorNext File System Tuning The Metadata Controller System The following items are a few things to watch out for: • A non-zero value for FSM wait SUMMARY journal waits indicates insufficient IOPS performance of the disks assigned to the metadata stripe group. This usually requires reducing the metadata I/O latency time by adjusting RAID cache settings or reducing bandwidth contention for the metadata LUN. Another possible solution is to add another metadata stripe group to the file system.
StorNext File System Tuning The Metadata Controller System preallocation and stripe alignment. In the directory-to-directory copy mode (for example, cvcp source_dir destination_dir,) cvcp conditionally uses the Bulk Create API to provide a dramatic small file copy performance boost. However, it will not use Bulk Create in some scenarios, such as non-root invocation, managed file systems, quotas, or Windows security. Hopefully, these limitations will be removed in a future release.
StorNext File System Tuning The Metadata Controller System Sample use cases: • Verify that I/O properties are as expected. You can use the VFS trace to ensure that the displayed properties are consistent with expectations, such as being well formed; buffered versus DMA; shared/non-shared; or I/O size. If a small I/O is being performed DMA, performance will be poor. If DMA I/O is not well formed, it requires an extra data copy and may even be broken into small chunks.
StorNext File System Tuning The Metadata Controller System displayed by the cvadmin who command). If all is specified, the test is run against each client in turn. The test is run for 2 seconds, unless a value for seconds is specified. Here is a sample run: snadmin (lsi) > latency-test Test started on client 1 (bigsky-node2)... latency 55us Test started on client 2 (k4)... latency 163us There is no rule-of-thumb for “good” or “bad” latency values.
StorNext File System Tuning The Distributed LAN (Disk Proxy) Networks background for improved performance. The default setting is optimal in most scenarios. The auto_dma_read_length and auto_dma_write_length settings determine the minimum transfer size where direct DMA I/O is performed instead of using the buffer cache for well-formed I/O. These settings can be useful when performance degradation is observed for small DMA I/O sizes compared to buffer cache.
StorNext File System Tuning The Distributed LAN (Disk Proxy) Networks anticipated traffic between all Distributed LAN clients and servers connected to them. A network switch that is dropping packets will cause TCP retransmissions. This can be easily observed on both Linux and Windows platforms by using the netstat -s command while Distributed LAN is in progress. Reducing the TCP window size used by Distributed LAN might also help with an oversubscribed network switch.
StorNext File System Tuning The Distributed LAN (Disk Proxy) Networks It can be useful to use a tool like netperf to help verify the performance characteristics of each Distributed LAN network. (When using netperf, on a system with multiple NICs, take care to specify the right IP addresses in order to ensure the network being tested is the one you will be running Distributed LAN over.
StorNext File System Tuning Distributed LAN Servers In the diagram there are two subnetworks: the blue subnetwork (10.0.0.x) and the red subnetwork (192.168.9.x). Servers such as S1 are connected to both the blue and red subnetworks, and can each provide up to 2 GByte/s of throughput to clients. (The three servers shown would thus provide an aggregate of 6 GByte/s.) Clients such as C1 are also connected to both the blue and red subnetworks, and can each get up to 2 GByte/s of throughput.
StorNext File System Tuning Distributed LAN Client Vs. Legacy Network Attached Storage Distributed LAN Client Vs. Legacy Network Attached Storage 0 StorNext provides support for legacy Network Attached Storage (NAS) protocols, including Network File System (NFS) and Common Internet File System (CIFS).
StorNext File System Tuning Distributed LAN Client Vs. Legacy Network Attached Storage Load Balancing 0 DLC automatically makes use of all available Distributed LAN Servers in an active/active fashion, and evenly spreads I/O across them. If a server goes down or one is added, the load balancing system automatically adjusts to support the new configuration.
StorNext File System Tuning Windows Memory Requirements Windows Memory Requirements Beginning in version 2.6.1, StorNext includes a number of performance enhancements that enable it to better react to changing customer load. However, these enhancements come with a price: memory requirement. When running on a 32-bit Windows system that is experiencing memory pressure, the tuning parameters might need adjusting to avoid running the system out of non-paged memory.
StorNext File System Tuning Windows Memory Requirements The more cvnodes that there are encached on the client, the fewer trips the client has to make over the wire to contact the FSM. Each cvnode is approximately 1462 bytes in size and is allocated from the non-paged pool. The cvnode cache is periodically purged so that unused entries are freed. The decision to purge the cache is made based on the Low, High, and Max water mark values.
StorNext File System Tuning Sample FSM Configuration File Sample FSM Configuration File This sample configuration file is located in the SNFS install directory under the examples subdirectory named example.cfg. # ************************************************************************* # A global section for defining file system-wide parameters.
StorNext File System Tuning Sample FSM Configuration File # # ThreadPoolSize 64 # default 16, 2 MB memory per thread # InodeCacheSize 32K # 800-1000 bytes each, default 32K # BufferCacheSize 64M # default 32MB # StripeAlignSize # OpHangLimitSecs 2M # auto alignment, default MAX(StripeBreadth) 300 # default 180 secs # DataMigrationThreadPoolSize 128 # Managed only, default 8 # ************************************************************************* # A disktype section for defining disk hardware p
StorNext File System Tuning Sample FSM Configuration File ************************************************************************* [Disk CvfsDisk0] Status UP Type MetaDrive [Disk CvfsDisk1] Status UP Type JournalDrive [Disk CvfsDisk2] Status UP Type VideoDrive [Disk CvfsDisk3] Status UP Type VideoDrive [Disk CvfsDisk4] Status UP Type VideoDrive [Disk CvfsDisk5] Status UP Type VideoDrive [Disk CvfsDisk6] Status UP Type VideoDrive [Disk CvfsDisk7] Status UP Type VideoDrive StorNext File System Tunin
StorNext File System Tuning Sample FSM Configuration File [Disk CvfsDisk8] Status UP Type VideoDrive [Disk CvfsDisk9] Status UP Type VideoDrive [Disk CvfsDisk10] Status UP Type AudioDrive [Disk CvfsDisk11] Status UP Type AudioDrive [Disk CvfsDisk12] Status UP Type AudioDrive [Disk CvfsDisk13] Status UP Type AudioDrive [Disk CvfsDisk14] Status UP Type DataDrive [Disk CvfsDisk15] Status UP Type DataDrive [Disk CvfsDisk16] StorNext File System Tuning Guide 31
StorNext File System Tuning Sample FSM Configuration File Status UP Type DataDrive [Disk CvfsDisk17] Status UP Type DataDrive # ************************************************************************* # A stripe section for defining stripe groups.
StorNext File System Tuning Sample FSM Configuration File Status UP Exclusive Yes##Exclusive StripeGroup for Video Files Only## Affinity VidFiles Read Enabled Write Enabled StripeBreadth 4M MultiPathMethod Rotate Node CvfsDisk2 0 Node CvfsDisk3 1 Node CvfsDisk4 2 Node CvfsDisk5 3 Node CvfsDisk6 4 Node CvfsDisk7 5 Node CvfsDisk8 6 Node CvfsDisk9 7 [StripeGroup AudioFiles] Status UP Exclusive Yes##Exclusive StripeGroup for Audio File Only## Affinity AudFiles Read Enabled Write Enabled StripeBreadth 1M Multi
StorNext File System Tuning Sample FSM Configuration File Write Enabled StripeBreadth 256K MultiPathMethod Rotate Node CvfsDisk14 0 Node CvfsDisk15 1 Node CvfsDisk16 2 Node CvfsDisk17 3 StorNext File System Tuning Guide 34