Concept Guide

7 Addressing the Memory Bottleneck in AI Model – Training for Healthcare

3D U-Net Model

Convolutional neural networks (CNNs) such as U-Net have been widely successfully in 2D

segmentation in computer vision problems [6]. However, most medical data used in clinical

practice consists of 3D volumes. Since only 2D slices can be displayed on a computer screen,

annotating these large volumes with segmentation labels in a slice-by-slice manner is

cumbersome and inefficient. 3D U-Net [7], based on U-Net architecture, performs volumetric

segmentation by taking 3D volumes as input and processing them with corresponding 3D

operations: 3D convolutions, 3D max-pooling, 3D up-sampling, etc. The resulting output is a

trained model that reasonably generalizes well since the image slices contain mostly repetitive

structures with corresponding variation. In general, the 3D U-Net model is both computation- and

memory-intensive.

Memory Profiling

Memory footprint is as important to deep-learning training as is raw processing throughput or

Floating-Point Operations per Second (FLOPs), especially when dealing with volumetric data and

large models such as 3D U-Net. Table 1. shows the breakdown of the memory requirement of the

3D U-Net model at the largest available image size (240x240x144 in the case of the BraTS

dataset) using a kernel size of 3x3x3. As indicated, the estimated system memory requirement is

a little less than 1 TB for a batch size of 16 MRI scans. On our development server equipped with

only 192 GB of system memory (Table 2), it took only a couple of minutes after starting model

training before the system ran out of memory and the whole experiment came to a stall.

Figure 2. 3D U-Net architecture. Each box corresponds to a multi-channel feature map; the

arrows denote different operations. [8]