White Papers

45 CheXNet Inference with Nvidia T4 on Dell EMC PowerEdge R7425
A Troubleshooting
In this section we describe the main issues we faced implementing the custom model CheXNet with
Nvidia TensorRT™ and how we solved these:
TensorRT™ installation. For TF-TRT integration, recommended to work with the docker image
nvcr.io/nvidia/tensorflow:<tag version>-py3. For Native TRT, recommended to work with the
docker image nvcr.io/nvidia/TensorRT™:<tag version>-py3.
Python path to TF models. If using TensorFlow official model as based model, and working
within the docker environment, make sure to include the python path to official models once
inside the docker: export PYTHONPATH="$PYTHONPATH:/home/models/“.
ImageNet TFRecords. If using TensorFlow official model as based model, make sure that
there are not missing tfrecords in the dataset. If this is the case, update the file
/home/models/official/resnet/imagenet_main.py.
Non-supported Layer Error. Before building the custom model, double check that the selected
framework supports operations by TensorRT™; otherwise, the network subgraph conversion will
fail. In our case, we started with Keras-TensorFlow backend framework and the TensorRT™
script failed converting most of the nodes. Then, we switched the model to TensorFlow
framework version and resolved the issues. See Supported operations for TF-TRT Integration
[13].
Unimplemented: Not supported constant type at Const_1/Const_5 Error. Error related with
the same issue above. By the time the tests were conducted, it looks like some Keras layers
were not supported by TF-TRT Integration.
Not conversion function registered for layer IteratortoGetNet Error. This error was thrown
by the system because the input function was not configurated in the model. When building the
custom model, make sure to define the input_function properly, and when exporting the model
with export_savedmodel make sure assure to configure the input_receiver_fn for serving as
input_receiver_fn=export.build_tensor_serving_input_receiver_fn(shape,
batch_size=FLAGS.batch_size)
Cuda Error in allocate:2. Subgraph conversion error for subgraph_index 1 due to:
“Internal: Engine building failureSKIPPING (437 nodes)”. Sometimes this error is related
to the GPU memory capacity; so, try to run the tests with lower batch size and one precision
mode at the time.
Tensor batch_normalization/beta is not found in resnet_v2_imagenet_checkpoint error.
In our case we built the custom model CheXNet using transfer learning and the TensorFlow
official pre-trained ResnetV2_50 checkpoints downloaded from its website. This error was
produced because by the time the model was trained we didn’t place our variables in the same