Reference Guide

22 Dell EMC Ready Solutions for AI Deep Learning with NVIDIA | v1.0
NFS on Near-Line SAS drives hosted by the head node and exported over NFS via IPoIB
NFS on SSD drives hosted by the head node and exported over NFS via IPoIB
NVMe
The Isilon F800 evaluated in this experiment has the same configuration as described in Section 2.5. Before
clear the cache on all four Isilon F800 nodes.
The NFS file share hosted on Near-Line SAS drives local to the head node (NL SAS NFS) consists of 12 TB
NL SAS drives on the head node in a RAID 50 volume, formatted as an XFS file system and exported via NFS
to the compute nodes over IPoIB.
The third storage system that was evaluated includes an NFS file share hosted on SATA SSDs local to the
This
configuration was included in the experiment to evaluate the performance an SSD based storage solution. Four
SATA SSD disks local to the head node were configured in a RAID 0 volume since the goal was to test the
maximum performance. The SSDs were 1.92TB read-intensive drives. Since most disk operation in Deep
Learning training are read-intensive, 1 Drive Writes Per Day (DWPD) drives were selected. The RAID 0 volume
was formatted as an XFS file system and exported to the compute nodes via IPoIB. The goal of this configuration
was to understand the benefit of SSD drives to the read operations in Deep Learning training and to determine
if this could be a viable option for a storage solution or as a scratch space for temporary files. For the purpose
of this experiment, an 8TB SSD solution was deemed sufficient. For environments that chose such an SSD
solution for a production NFS system, RAID 6 or RAID 50 is recommended to protect data against disk failures.
The quantity and capacity of the SSDs should also be increased to accommodate user and project storage
needs.
The final storage subsystem evaluated used NVMe drives. The compute nodes have an option for local NVMe
devices in addition to SAS or SATA, and these devices can be used as local scratch space. In a typical Deep
Learning training cycle, the same dataset is used multiple times. If the dataset is large and cannot fit in the
compute node system memory, a larger capacity local NVMe drive can be used to provide much faster disk-to-
memory I/O than SSD drives. The PowerEdge C4140 can support up to two NVMe devices in PCI-e card form
factor in the rear PCIe slots in the server chassis. 1x 1.6 TB NVMe device was tested in this study as the other
PCIe slot was populated by the Mellanox InfiniBand EDR adapter. The random read performance of the single
NVMe device is 1080,000 IOPS and the sequential read performance is 6,400 MB/s. This option is called
Since the entire ILSVRC2012 benchmark dataset is only ~140 GB and can easily fit into system memory, the
size of the dataset was increased 10x by applying 10 different data augmentation techniques to each JPEG
image in the dataset. The augmented images were then converted to a TensorFlow TFRecords database.
TFRecords file format is a binary format that combines multiple raw image files together with their metadata
information into one binary file. It maintains the image compression offered by the JPEG format and the total
size of the dataset remained the same. Training performance was measured with the TensorFlow framework
using both TFRecords database as well as the raw augmented JPEG images, and the results are presented in
Figure 10. The program did not run to completion with Resnet50 and JPEG images for all storage options, so
there is no corresponding values in Figure 10 (b). This issue will be studied as part of future work.
The performance studies in the Figure 10 below have been conducted with the 1 and 2 nodes of the PowerEdge
C4140 with V100 SXM2 GPUs interconnected with EDR InfiniBand as listed in Table 5.
The results show that for each of the three neural networks, there is no perceptible difference with respect to
the type of storage subsystem that was used. The conclusion is that the frameworks tested do not significantly