Reference Guide

29 Dell EMC Ready Solutions for AI Deep Learning with NVIDIA | v1.0

Figure 15: Inference performance with INT8 vs FP32 for Resnet50 model

Figure 15 also illustrates the performance difference when using different batch sizes. It can be seen that

without batch processing the inference throughput is much lower. This is because the GPU is not assigned

enough work in each iteration to keep it busy. The larger the batch size, the higher the inference throughput,

although this advantage begins to flatten as batch size increases. The largest batch size is limited by GPU

memory.

The accuracy of using INT8 versus FP32 was compared to confirm that the inference performance gains with

INT8 do not incur a penalty with inaccurate results. To ensure INT8 data encodes the same information as

FP32 data, a calibration method is applied in TensorRT to convert FP32 to INT8 in a way that minimizes the

loss of information. More details of this calibration method can be found in the presentation 8-bit Inference with

TensorRT from the GTC 2017 conference. The ILSVRC2012 validation dataset was used for both calibration

and benchmarking. The validation dataset has 50,000 images and was divided into batches where each batch

has 25 images. The first 50 batches were used for calibration purpose and the rest of the images were used for

accuracy measurement. Several pre-trained neural network models were used in our experiments, including

ResNet-50, ResNet-101, ResNet-152 , VGG-16, VGG-19 , GoogLeNet and AlexNet. Both top-1 and top-5

image classification accuracy were recorded using FP32 and INT8 and the accuracy difference between FP32

and INT8 was calculated. The top-1 accuracy means the percentage of the total number of matches from the

validation set where the first predicted output (highest confidence result) from the model is the same as the

ground truth. The top-5 accuracy means the percentage of the total matches from the validation set where the

ground truth falls in the first five of the highest confidence predictions by the model. The result is shown in Table

8. It is clear from the results that the accuracy difference between FP32 and INT8 is not significant, between

0.02% - 0.18% for all test cases. This means very little accuracy is lost with INT8, while achieving a 3x speed

up over FP32.

Table 8: The accuracy comparison between FP32 and INT8

Neural

Network Model

FP32 INT8 Difference

Top-1 Top-5 Top-1 Top-5 Top-1 Top-5

ResNet-50 72.90% 91.14% 72.84% 91.08% 0.07% 0.06%

ResNet-101 74.33% 91.95% 74.31% 91.88% 0.02% 0.07%