Reference Guide

27 Dell EMC Ready Solutions for AI Deep Learning with NVIDIA | v1.0
On Isilon F800, the network throughput is around 300 MB/s to maintain high GPU utilization. This is the data
transfer throughput of Resnet50 training and it almost does not change during the whole training. The network
bandwidth is 56 Gb/s (7 GB/s) between the IB EDR Switch and the FDR-40 GigE Gateway as shown in Figure
2. Two more InfiniBand network connections can be added (three connections in total) to match the bandwidth
of four Isilon nodes, which is 4 x 40 Gb/s. This supports up to 70 ((7 GB/s)*3/(300 MB/s)) compute nodes. This
number is far more than the total number of 36 ports in IB EDR Switch. This leaves more room for scaling the
cluster with more compute nodes.
When no data cached in Isilon F800, the disk throughput is the same as network throughput at the beginning
of training. As more data are cached into Isilon memory, the disk throughput is decreasing. Since the dataset
is too large to fit into system memory on compute and Isilon node, the system has to fetch the data from SSDs
on Isilon. That is the reason of consistent disk read throughout Resnet50 training.
(a) Resnet50
(b) AlexNet
Figure 14: Scaling results with Isilon F800. In FP32 mode, the batch size is 128 and 512 for Resnet50
and AlexNet, respectively. In FP16 mode, the batch size is 256 and 1024 for Resnet50 and AlexNet,
respectively.
Figure 14 summaries the scaling results on up to four compute nodes (16 GPUs) for Resnet50 and AlexNet
with Isilon F800. To stress the I/O, the batch size for AlexNet is changed to 512 in FP32 mode and 1024 in
FP16 mode.