Concept Guide

11 Addressing the Memory Bottleneck in AI Model Training for Healthcare
was 30 seconds per image, a 3.4x speedup (Figure 6) compared to stock TensorFlow (without
DNNL) at the same training batch size of 16.
Figure 7 depicts the prediction performance of the trained model. As observed, the
segmentation mask from the model predictions closely match the ground truth mask. Using Table
1 as a reference, along with the TS and epoch count, machine learning practitioners can “plug in
their specific training data and hyperparameters to estimate both the required system memory
and task completion time when training their own deep learning models on Intel architecture.
Figure 5. 3D U-Net memory footprint shows correlation
with our theoretical calculations from Table 1.
Figure 6. TensorFlow with Deep Neural
Network Library (DNNL) enabled
achieves increased performance versus
stock TensorFlow (without DNNL).
Figure 7. Prediction performance of the trained model, showing a slice of the brain
from different views. The red overlay is the prediction from the model and the blue
overlay is the ground truth mask. Any purple voxels are true positives.