Concept Guide

11 Addressing the Memory Bottleneck in AI Model – Training for Healthcare

was 30 seconds per image, a 3.4x speedup (Figure 6) compared to stock TensorFlow (without

DNNL) at the same training batch size of 16.

Figure 7 depicts the prediction performance of the trained model. As observed, the

segmentation mask from the model predictions closely match the ground truth mask. Using Table

1 as a reference, along with the TS and epoch count, machine learning practitioners can “plug in”

their specific training data and hyperparameters to estimate both the required system memory

and task completion time when training their own deep learning models on Intel architecture.

Figure 5. 3D U-Net memory footprint shows correlation

with our theoretical calculations from Table 1.

Figure 6. TensorFlow with Deep Neural

Network Library (DNNL) enabled

achieves increased performance versus

stock TensorFlow (without DNNL).

Figure 7. Prediction performance of the trained model, showing a slice of the brain

from different views. The red overlay is the prediction from the model and the blue

overlay is the ground truth mask. Any purple voxels are true positives.