White Papers

35 CheXNet – Inference with Nvidia T4 on Dell EMC PowerEdge R7425

Figure 14. Latency CheXNet TF-TRT-INT8int8 versus ResnetV2_50 TF-TRT-INT8int8

Inference

5.5 CheXNet Inference - Native TensorFlow FP32fp32 with GPU versus

TF-TRT 5.0 INT8

After confirming that our custom model performed well compared to the optimized inference

TF-TRT of an official model, we proceeded in this section to compare the CheXNet inference

model itself in different configurations. In the Error! Reference source not found. we have

gathered the previous results obtained when we ran the inference in three modes:

a) Native TensorFlow fpFP32-CPU Only (CPU)

b) Native TensorFlow fpFP32-GPU (GPU)

c) TF-TRT Integration in INT8int8 (GPU)

Figure 15 shows the CheXNet inference throughput (img/sec) ran in different

configuration modes and batch sizes. As we can appreciate the TF-TRT_INT8 precision

mode outperformed the two other configurations consistently across several batch sizes. In

the next sections we analyzed in detail this performance improvement.