Reference Guide

25 Dell EMC Ready Solutions for AI Deep Learning with NVIDIA | v1.0

Figure 12: The profiled disk throughput with InsightIQ when running Resnet50 with Isilon

To better understand the underlying storage system, the Isilon storage I/O performance was profiled using Isilon

InsightIQ while training the model. The Isilon InsightIQ was described in 2.5. Only the Isilon storage with

TFRecords dataset was profiled since all storage systems displayed similar performance and the lessons from

one profiling exercise should be broadly applicable for this use case. Figure 12 shows an example of InsightIQ

snapshot of the disk throughput when running Resnet50. The disk throughput in the figure is decreasing

because more data were cached in Isilon memory. The full disk profiling data for AlexNet, Resnet50, and

VGG16 are shown in Table 7. The training performances are also added in the table for validation. Take

Resnet50 for example, the training performance is 2,940 images/sec. The average size of each image is 0.113

MB (the total training images size is 1,448,283,629,516 bytes, divided by 12811670 image files), so the

expected disk throughput is 2,940 images/sec * 0.113 MB * 8 b/B = 2,658 Mb/s which matches the actual disk

throughput 2,680 Mb/s. The same conclusion also applies to AlexNet and VGG16 neural networks.

Table 7: The disk metrics in Isilon F800 with single compute node

AlexNet Resnet50 VGG16

Average Disk Operation Size (MB) 8.61 9.17 8.81

Disk Read IOPS (K/s) 344 212 111

Disk Throughput (Mb/s) 6370 2680 1440

Training Performance (Images/sec)

7095 2940 1590

Figure 13 illustrates the CPU utilization, memory usage and GPU utilization on one compute node, and the

network and disk throughput on Isilon F800 when running Resnet50 FP16 training with 10 times ILSVRC2012

TFRecords dataset. The training speed keeps 2,940 images/sec throughout the whole training time.

The CPU utilization is around 33% out of 40 cores, which are used for I/O, TFRecords parsing and JPEG

decoding. This verifies that the CPUs are not the bottleneck in Deep Learning training of Resnet50. For memory,

only around 19% of 384GB memory are used. The buffer usage is increasing to cache the data in memory until

the whole memory is used. But the whole dataset cannot be all fitted into memory. This is because 10x

ILSVRC2012 dataset was used by intention, the total size of the dataset is around 1.35TB which cannot be fully