White Papers

35 CheXNet Inference with Nvidia T4 on Dell EMC PowerEdge R7425
Figure 14. Latency CheXNet TF-TRT-INT8int8 versus ResnetV2_50 TF-TRT-INT8int8
Inference
5.5 CheXNet Inference - Native TensorFlow FP32fp32 with GPU versus
TF-TRT 5.0 INT8
After confirming that our custom model performed well compared to the optimized inference
TF-TRT of an official model, we proceeded in this section to compare the CheXNet inference
model itself in different configurations. In the Error! Reference source not found. we have
gathered the previous results obtained when we ran the inference in three modes:
a) Native TensorFlow fpFP32-CPU Only (CPU)
b) Native TensorFlow fpFP32-GPU (GPU)
c) TF-TRT Integration in INT8int8 (GPU)
Figure 15 shows the CheXNet inference throughput (img/sec) ran in different
configuration modes and batch sizes. As we can appreciate the TF-TRT_INT8 precision
mode outperformed the two other configurations consistently across several batch sizes. In
the next sections we analyzed in detail this performance improvement.