White Papers

44 CheXNet Inference with Nvidia T4 on Dell EMC PowerEdge R7425 | Document ID
6 Conclusion and Future Work
Dell EMC offers an excellent solution with its PowerEdge R7425 server based on Nvidia T4
GPU to accelerate Artificial Intelligent workloads, including high-performance Deep learning
inference boosted with the Nvidia TensorRT™ library.
The Native TensorFlow fp32 (without TensorRT™) inference on PowerEdge R7425-T4-16GB
server speedup ~16X faster than CPU Only (AMD EPYC 7551 32-Core Processor). It is a
referenceable measurement that shows the benefit of using GPU based systems versus CPU
only based systems.
When accelerating the custom model CheXNet with TensorFlow-TensorRT Integration, the
PowerEdge R7425-T4-16GB server performed on average up to 58X faster than native
TensorFlow-CPU Only.
When accelerating the custom model CheXNet with TensorFlow-TensorRT Integration, the
PowerEdge R7425-T4-16GB server performed on average up to 4X faster than native
TensorFlow-GPU.
The CheXNet inference using TF-TRT-INT8 precision mode speedup of ~802% versus Native
TensorFlow FP32 on GPU, at a ~7ms latency target.
CheXNet inference optimized with Native TRT5 C++ API performed ~2X faster than with TF-
TRT Integration API optimization, this factor was exposed only with batch size 1 and 2; the
outperform of TRT5 C++ API over TF-TRT API gradually decreased in the way the batch size
was bigger. We are still working with the Nvidia Developer group to find how out what should be
the performance of both APIs implementations.
Optimized models with Nvidia TensorRT™ 5 can be deployed in several environments
depending of the target application such as scale-out data centers, embedded systems, or
automotive product platforms. There are other implementation factors that could affect the end
to end inference’s speed when deploying the optimized models into these production
environments, so model optimization is just one of factors and we have demonstrated in these
projects some methods on how to approach it.