Concept Guide

12 Addressing the Memory Bottleneck in AI Model Training for Healthcare
Conclusions
In this white paper, we presented the multimodal brain tumor analysis for medical diagnosis,
highlighted the computing challenges, and presented the 3D U-Net model for the task of
volumetric image segmentation. We pre-calculated the memory requirement of the model and
analyzed 3 different server configurations with varying memory capacity: from a “dev server” with
192 GB of memory to a “memory-rich” server with over 1 TB of memory. With the memory-rich
sever, we trained the 3D U-Net model using the BraTS dataset (a medical segmentation
benchmark) and achieved close to state-of-the-art accuracy of 0.997 and dice coefficient of 0.83.
The maximum memory utilization of the model during training also corresponds to our pre-
calculated memory requirement, suggesting the generalizability of our approach to other memory-
bound deep learning algorithms.
To the best of our knowledge, the results presented in this paper represent the first milestone
in training a deep neural network having large memory footprint (close to 1 TB) on a single-node
server without hardware accelerators like GPUs. Further, by enabling Deep Neural Network
Library (DNNL) optimizations, we achieved a speedup of 3.4x per training step compared to stock
TensorFlow. By replicating the single-node, memory-rich configuration described in this paper
into a multi-node CPU cluster setup, we can expect to see greatly enhanced training performance
of the 3D U-Net model as well as that of other complex 3D models and data sets, potentially
reducing organizations TCO [13].
Acknowledgments
Center for Space High-Performance and Resilient Computing (SHREC), University of Florida
University de Montreal
NEUROMOD
Dell EMC
Intel