Reference Guide

15 Dell EMC Ready Solutions for AI Deep Learning with NVIDIA | v1.0
Table 5: The hardware and software in the testbed
Hardware Head node
Cluster head node
PowerEdge R740xd
CPU 2 x Intel Xeon 6148 @ 2.4GHz
Memory 384GB DDR4 @ 2667MT/s
Disks on head node 12 x12 TB Near-line SAS drives in a RAID 50 volume.
120TB volume formatted as XFS, exported via NFS
Hardware Compute node
Cluster compute node PowerEdge C4140
Number of compute nodes 8 nodes with V100-PCIe and 2 nodes with V100-SXM2
CPU 2 x Intel Xeon 6148 @ 2.4GHz
Memory 384GB DDR4 @ 2667MT/s
Disks 2x M.2, 240GB in Raid1
GPU V100-SXM2, V100-PCIe
Software and Firmware
Operating System Red Hat Enterprise Linux 7.4
Linux Kernel 3.10.0-693.el7.x86_64
BIOS 1.1.6
CUDA compiler and GPU driver CUDA 9.1.85 (390.46)
Python 2.7.5
Deep Learning Datasets
Dataset for training ILSVRC2012 training dataset, 1,281,167 images
Dataset for inference ILSVRC2012 validation dataset, 50,000 images
Deep Learning Libraries and Frameworks
CUDNN 7.0
NCCL 2.1.15
Horovod 0.12.1
TensorFlow 1.8
MXNet 0.11.1
Caffe2 0.8.1+
TensorRT 4.0.0.3
3.1.1 FP16 vs FP32
The V100 GPUs contains a new type of processing core called Tensor Cores which support mixed precision
training. Although many High Performance Computing (HPC) applications require high precision computation
with FP32 (32-bit floating point) or FP64 (64-bit floating point), Deep Learning researchers have found they are
able to achieve the same inference accuracy with FP16 (16-bit floating point) as can be had with FP32. In this