Typically, model training is a time-consuming step during deep learning development, especially in medical imaging applications. Volumetric medical images are usually large (as multi-dimensional arrays) and the model training process can be complex. Even with powerful hardware (e.g. CPU/GPU with large RAM), it is not easy to fully leverage them to achieve high performance. NVIDIA GPUs have been widely applied in many areas of deep learning training and evaluation, and the CUDA parallel computation shows obvious acceleration when comparing to traditional computation methods. To fully leverage GPU features, many popular mechanisms raised, like automatic mixed precision (AMP), distributed data parallel, etc. MONAI can support these features and this folder provides a fast training guide to achieve the best performance and rich examples.
The document introduces details of how to profile the training pipeline, how to analyze the dataset and select suitable algorithms, and how to optimize GPU utilization in single GPU, multi-GPUs or even multi-nodes.
The examples show how to execute distributed training and evaluation based on 3 different frameworks:
- PyTorch native
DistributedDataParallelmodule withtorchrun. - Horovod APIs with
horovodrun. - PyTorch ignite and MONAI workflows.
They can run on several distributed nodes with multiple GPU devices on every node.
And compares the training speed and memory usage with/without AMP.
This notebook compares the performance of Dataset, CacheDataset and PersistentDataset. These classes differ in how data is stored (in memory or on disk), and at which moment transforms are applied.
This tutorial compares the training performance of pure PyTorch program and optimized program in MONAI based on NVIDIA GPU device and latest CUDA library.
The optimization methods mainly include: AMP, CacheDataset and Novograd.
Demonstrates the use of the ThreadBuffer class used to generate data batches during training in a separate thread.
Illustrate reading NIfTI files and test speed of different transforms on different devices.
This notebook shows how to use TensorRT to accelerate the model and achieve a better inference latency.
Information about how to set up and apply existing tools to monitor the computing resources.
To run a model on a MacBook M4 Max 2024, you need to install the necessary dependencies. Follow these steps:
-
Install Homebrew if you haven't already:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" -
Install Python:
brew install python
-
Install virtualenv:
pip install virtualenv
-
Create a virtual environment:
virtualenv monai_env
-
Activate the virtual environment:
source monai_env/bin/activate -
Install MONAI and other dependencies:
pip install monai numpy torch torchvision
-
Clone the Project-MONAI repository:
git clone https://github.com/Project-MONAI/tutorials.git cd tutorials -
Navigate to the desired tutorial directory, for example:
cd acceleration
-
Choose the tutorial or example you want to run. For instance, to run the
fast_training_tutorial.ipynb, you can use Jupyter Notebook. -
Install Jupyter Notebook:
pip install notebook
-
Start Jupyter Notebook:
jupyter notebook
-
Open the desired notebook (e.g.,
fast_training_tutorial.ipynb) in your browser and follow the instructions to run the model.
By following these steps, you should be able to install and run a model on your MacBook M4 Max 2024 with the specified system information.