Reference Guide

22 Dell EMC Ready Solutions for AI Deep Learning with NVIDIA | v1.0

NFS on Near-Line SAS drives hosted by the head node and exported over NFS via IPoIB

NFS on SSD drives hosted by the head node and exported over NFS via IPoIB

NVMe

The Isilon F800 evaluated in this experiment has the same configuration as described in Section 2.5. Before

clear the cache on all four Isilon F800 nodes.

The NFS file share hosted on Near-Line SAS drives local to the head node (NL SAS NFS) consists of 12 TB

NL SAS drives on the head node in a RAID 50 volume, formatted as an XFS file system and exported via NFS

to the compute nodes over IPoIB.

The third storage system that was evaluated includes an NFS file share hosted on SATA SSDs local to the

This

configuration was included in the experiment to evaluate the performance an SSD based storage solution. Four

SATA SSD disks local to the head node were configured in a RAID 0 volume since the goal was to test the

maximum performance. The SSDs were 1.92TB read-intensive drives. Since most disk operation in Deep

Learning training are read-intensive, 1 Drive Writes Per Day (DWPD) drives were selected. The RAID 0 volume

was formatted as an XFS file system and exported to the compute nodes via IPoIB. The goal of this configuration

was to understand the benefit of SSD drives to the read operations in Deep Learning training and to determine

if this could be a viable option for a storage solution or as a scratch space for temporary files. For the purpose

of this experiment, an 8TB SSD solution was deemed sufficient. For environments that chose such an SSD

solution for a production NFS system, RAID 6 or RAID 50 is recommended to protect data against disk failures.

The quantity and capacity of the SSDs should also be increased to accommodate user and project storage

needs.

The final storage subsystem evaluated used NVMe drives. The compute nodes have an option for local NVMe

devices in addition to SAS or SATA, and these devices can be used as local scratch space. In a typical Deep

Learning training cycle, the same dataset is used multiple times. If the dataset is large and cannot fit in the

compute node system memory, a larger capacity local NVMe drive can be used to provide much faster disk-to-

memory I/O than SSD drives. The PowerEdge C4140 can support up to two NVMe devices in PCI-e card form

factor in the rear PCIe slots in the server chassis. 1x 1.6 TB NVMe device was tested in this study as the other

PCIe slot was populated by the Mellanox InfiniBand EDR adapter. The random read performance of the single

NVMe device is 1080,000 IOPS and the sequential read performance is 6,400 MB/s. This option is called

Since the entire ILSVRC2012 benchmark dataset is only ~140 GB and can easily fit into system memory, the

size of the dataset was increased 10x by applying 10 different data augmentation techniques to each JPEG

image in the dataset. The augmented images were then converted to a TensorFlow TFRecords database.

TFRecords file format is a binary format that combines multiple raw image files together with their metadata

information into one binary file. It maintains the image compression offered by the JPEG format and the total

size of the dataset remained the same. Training performance was measured with the TensorFlow framework

using both TFRecords database as well as the raw augmented JPEG images, and the results are presented in

Figure 10. The program did not run to completion with Resnet50 and JPEG images for all storage options, so

there is no corresponding values in Figure 10 (b). This issue will be studied as part of future work.

The performance studies in the Figure 10 below have been conducted with the 1 and 2 nodes of the PowerEdge

C4140 with V100 SXM2 GPUs interconnected with EDR InfiniBand as listed in Table 5.

The results show that for each of the three neural networks, there is no perceptible difference with respect to

the type of storage subsystem that was used. The conclusion is that the frameworks tested do not significantly