Reference Guide

27 Dell EMC Ready Solutions for AI Deep Learning with NVIDIA | v1.0

On Isilon F800, the network throughput is around 300 MB/s to maintain high GPU utilization. This is the data

transfer throughput of Resnet50 training and it almost does not change during the whole training. The network

bandwidth is 56 Gb/s (7 GB/s) between the IB EDR Switch and the FDR-40 GigE Gateway as shown in Figure

2. Two more InfiniBand network connections can be added (three connections in total) to match the bandwidth

of four Isilon nodes, which is 4 x 40 Gb/s. This supports up to 70 ((7 GB/s)*3/(300 MB/s)) compute nodes. This

number is far more than the total number of 36 ports in IB EDR Switch. This leaves more room for scaling the

cluster with more compute nodes.

When no data cached in Isilon F800, the disk throughput is the same as network throughput at the beginning

of training. As more data are cached into Isilon memory, the disk throughput is decreasing. Since the dataset

is too large to fit into system memory on compute and Isilon node, the system has to fetch the data from SSDs

on Isilon. That is the reason of consistent disk read throughout Resnet50 training.

(a) Resnet50

(b) AlexNet

Figure 14: Scaling results with Isilon F800. In FP32 mode, the batch size is 128 and 512 for Resnet50

and AlexNet, respectively. In FP16 mode, the batch size is 256 and 1024 for Resnet50 and AlexNet,

respectively.

Figure 14 summaries the scaling results on up to four compute nodes (16 GPUs) for Resnet50 and AlexNet

with Isilon F800. To stress the I/O, the batch size for AlexNet is changed to 512 in FP32 mode and 1024 in

FP16 mode.