White Papers

44 CheXNet – Inference with Nvidia T4 on Dell EMC PowerEdge R7425 | Document ID

6 Conclusion and Future Work

• Dell EMC offers an excellent solution with its PowerEdge R7425 server based on Nvidia T4

GPU to accelerate Artificial Intelligent workloads, including high-performance Deep learning

inference boosted with the Nvidia TensorRT™ library.

• The Native TensorFlow fp32 (without TensorRT™) inference on PowerEdge R7425-T4-16GB

server speedup ~16X faster than CPU Only (AMD EPYC 7551 32-Core Processor). It is a

referenceable measurement that shows the benefit of using GPU based systems versus CPU

only based systems.

• When accelerating the custom model CheXNet with TensorFlow-TensorRT Integration, the

PowerEdge R7425-T4-16GB server performed on average up to 58X faster than native

TensorFlow-CPU Only.

• When accelerating the custom model CheXNet with TensorFlow-TensorRT Integration, the

PowerEdge R7425-T4-16GB server performed on average up to 4X faster than native

TensorFlow-GPU.

• The CheXNet inference using TF-TRT-INT8 precision mode speedup of ~802% versus Native

TensorFlow FP32 on GPU, at a ~7ms latency target.

• CheXNet inference optimized with Native TRT5 C++ API performed ~2X faster than with TF-

TRT Integration API optimization, this factor was exposed only with batch size 1 and 2; the

outperform of TRT5 C++ API over TF-TRT API gradually decreased in the way the batch size

was bigger. We are still working with the Nvidia Developer group to find how out what should be

the performance of both APIs implementations.

• Optimized models with Nvidia TensorRT™ 5 can be deployed in several environments

depending of the target application such as scale-out data centers, embedded systems, or

automotive product platforms. There are other implementation factors that could affect the end

to end inference’s speed when deploying the optimized models into these production

environments, so model optimization is just one of factors and we have demonstrated in these

projects some methods on how to approach it.