Reference Guide

25 Dell EMC Ready Solutions for AI Deep Learning with NVIDIA | v1.0
Figure 12: The profiled disk throughput with InsightIQ when running Resnet50 with Isilon
To better understand the underlying storage system, the Isilon storage I/O performance was profiled using Isilon
InsightIQ while training the model. The Isilon InsightIQ was described in 2.5. Only the Isilon storage with
TFRecords dataset was profiled since all storage systems displayed similar performance and the lessons from
one profiling exercise should be broadly applicable for this use case. Figure 12 shows an example of InsightIQ
snapshot of the disk throughput when running Resnet50. The disk throughput in the figure is decreasing
because more data were cached in Isilon memory. The full disk profiling data for AlexNet, Resnet50, and
VGG16 are shown in Table 7. The training performances are also added in the table for validation. Take
Resnet50 for example, the training performance is 2,940 images/sec. The average size of each image is 0.113
MB (the total training images size is 1,448,283,629,516 bytes, divided by 12811670 image files), so the
expected disk throughput is 2,940 images/sec * 0.113 MB * 8 b/B = 2,658 Mb/s which matches the actual disk
throughput 2,680 Mb/s. The same conclusion also applies to AlexNet and VGG16 neural networks.
Table 7: The disk metrics in Isilon F800 with single compute node
AlexNet Resnet50 VGG16
Average Disk Operation Size (MB) 8.61 9.17 8.81
Disk Read IOPS (K/s) 344 212 111
Disk Throughput (Mb/s) 6370 2680 1440
Training Performance (Images/sec)
7095 2940 1590
Figure 13 illustrates the CPU utilization, memory usage and GPU utilization on one compute node, and the
network and disk throughput on Isilon F800 when running Resnet50 FP16 training with 10 times ILSVRC2012
TFRecords dataset. The training speed keeps 2,940 images/sec throughout the whole training time.
The CPU utilization is around 33% out of 40 cores, which are used for I/O, TFRecords parsing and JPEG
decoding. This verifies that the CPUs are not the bottleneck in Deep Learning training of Resnet50. For memory,
only around 19% of 384GB memory are used. The buffer usage is increasing to cache the data in memory until
the whole memory is used. But the whole dataset cannot be all fitted into memory. This is because 10x
ILSVRC2012 dataset was used by intention, the total size of the dataset is around 1.35TB which cannot be fully