Reference Guide

29 Dell EMC Ready Solutions for AI Deep Learning with NVIDIA | v1.0
Figure 15: Inference performance with INT8 vs FP32 for Resnet50 model
Figure 15 also illustrates the performance difference when using different batch sizes. It can be seen that
without batch processing the inference throughput is much lower. This is because the GPU is not assigned
enough work in each iteration to keep it busy. The larger the batch size, the higher the inference throughput,
although this advantage begins to flatten as batch size increases. The largest batch size is limited by GPU
memory.
The accuracy of using INT8 versus FP32 was compared to confirm that the inference performance gains with
INT8 do not incur a penalty with inaccurate results. To ensure INT8 data encodes the same information as
FP32 data, a calibration method is applied in TensorRT to convert FP32 to INT8 in a way that minimizes the
loss of information. More details of this calibration method can be found in the presentation 8-bit Inference with
TensorRT from the GTC 2017 conference. The ILSVRC2012 validation dataset was used for both calibration
and benchmarking. The validation dataset has 50,000 images and was divided into batches where each batch
has 25 images. The first 50 batches were used for calibration purpose and the rest of the images were used for
accuracy measurement. Several pre-trained neural network models were used in our experiments, including
ResNet-50, ResNet-101, ResNet-152 , VGG-16, VGG-19 , GoogLeNet and AlexNet. Both top-1 and top-5
image classification accuracy were recorded using FP32 and INT8 and the accuracy difference between FP32
and INT8 was calculated. The top-1 accuracy means the percentage of the total number of matches from the
validation set where the first predicted output (highest confidence result) from the model is the same as the
ground truth. The top-5 accuracy means the percentage of the total matches from the validation set where the
ground truth falls in the first five of the highest confidence predictions by the model. The result is shown in Table
8. It is clear from the results that the accuracy difference between FP32 and INT8 is not significant, between
0.02% - 0.18% for all test cases. This means very little accuracy is lost with INT8, while achieving a 3x speed
up over FP32.
Table 8: The accuracy comparison between FP32 and INT8
Neural
Network Model
FP32 INT8 Difference
Top-1 Top-5 Top-1 Top-5 Top-1 Top-5
ResNet-50 72.90% 91.14% 72.84% 91.08% 0.07% 0.06%
ResNet-101 74.33% 91.95% 74.31% 91.88% 0.02% 0.07%