White Papers
24 CheXNet – Inference with Nvidia T4 on Dell EMC PowerEdge R7425 | Document ID
Files used for development:
Script:
tensorrt_chexnet.py
Base model script:
tensorrt.py
Labels file
labellist_chest_x_ray.json
4.2 TensorRT™ using TensorRT C++ API
In this section, we present how to run optimized inferences with an existing TensorFlow
model using TensorRT C++ API. The first step is to convert the frozen graph model to uff
file format with the C++ UFF parser API which supports TensorFlow models, then follow the
workflow in the Figure 9 to create the TensorRT™ engine for optimized inferences:
• Create a TensorRT™ network definition from the existing trained model
• Invoke the TensorRT™ builder to create an optimized runtime engine from the network
• Serialize and deserialize the engine so that it can be rapidly recreated at runtime
• Feed the engine with data to perform inference
For the current implementation, we are using Nvidia script trtexec.cpp and referenced the
TensorRT™ Developer Guide to document the steps described below [15].
Figure 9: Workflow for Creating a TensorRT Inference Graph using the TensorRT C++ API