Performance optimization and GPU acceleration

Typically, model training is a time-consuming step during deep learning development, especially in medical imaging applications. Volumetric medical images are usually large (as multi-dimensional arrays) and the model training process can be complex. Even with powerful hardware (e.g. CPU/GPU with large RAM), it is not easy to fully leverage them to achieve high performance. NVIDIA GPUs have been widely applied in many areas of deep learning training and evaluation, and the CUDA parallel computation shows obvious acceleration when comparing to traditional computation methods. To fully leverage GPU features, many popular mechanisms raised, like automatic mixed precision (AMP), distributed data parallel, etc. MONAI can support these features and this folder provides a fast training guide to achieve the best performance and rich examples.

List of notebooks and examples

fast_model_training_guide

The document introduces details of how to profile the training pipeline, how to analyze the dataset and select suitable algorithms, and how to optimize GPU utilization in single GPU, multi-GPUs or even multi-nodes.

distributed_training

The examples show how to execute distributed training and evaluation based on 3 different frameworks:

PyTorch native DistributedDataParallel module with torchrun.
Horovod APIs with horovodrun.
PyTorch ignite and MONAI workflows.

They can run on several distributed nodes with multiple GPU devices on every node.

automatic_mixed_precision

And compares the training speed and memory usage with/without AMP.

dataset_type_performance

This notebook compares the performance of Dataset, CacheDataset and PersistentDataset. These classes differ in how data is stored (in memory or on disk), and at which moment transforms are applied.

fast_training_tutorial

This tutorial compares the training performance of pure PyTorch program and optimized program in MONAI based on NVIDIA GPU device and latest CUDA library. The optimization methods mainly include: AMP, CacheDataset and Novograd.

threadbuffer_performance

Demonstrates the use of the ThreadBuffer class used to generate data batches during training in a separate thread.

transform_speed

Illustrate reading NIfTI files and test speed of different transforms on different devices.

TensorRT_inference_acceleration

This notebook shows how to use TensorRT to accelerate the model and achieve a better inference latency.

Tutorials for resource monitoring

Information about how to set up and apply existing tools to monitor the computing resources.

Running a Model on MacBook M4 Max 2024

Step-by-Step Guide

1. Installing Dependencies

To run a model on a MacBook M4 Max 2024, you need to install the necessary dependencies. Follow these steps:

Install Homebrew if you haven't already:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Install Python:
```
brew install python
```
Install virtualenv:
```
pip install virtualenv
```
Create a virtual environment:
```
virtualenv monai_env
```
Activate the virtual environment:
```
source monai_env/bin/activate
```

Install MONAI and other dependencies:

pip install monai numpy torch torchvision

2. Setting Up the Environment

Clone the Project-MONAI repository:

git clone https://github.com/Project-MONAI/tutorials.git
cd tutorials

Navigate to the desired tutorial directory, for example:
```
cd acceleration
```

3. Running a Model

Choose the tutorial or example you want to run. For instance, to run the fast_training_tutorial.ipynb, you can use Jupyter Notebook.
Install Jupyter Notebook:
```
pip install notebook
```
Start Jupyter Notebook:
```
jupyter notebook
```
Open the desired notebook (e.g., fast_training_tutorial.ipynb) in your browser and follow the instructions to run the model.

By following these steps, you should be able to install and run a model on your MacBook M4 Max 2024 with the specified system information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance optimization and GPU acceleration

List of notebooks and examples

fast_model_training_guide

distributed_training

automatic_mixed_precision

dataset_type_performance

fast_training_tutorial

threadbuffer_performance

transform_speed

TensorRT_inference_acceleration

Tutorials for resource monitoring

Running a Model on MacBook M4 Max 2024

Step-by-Step Guide

1. Installing Dependencies

2. Setting Up the Environment

3. Running a Model

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Performance optimization and GPU acceleration

List of notebooks and examples

Running a Model on MacBook M4 Max 2024

Step-by-Step Guide

1. Installing Dependencies

2. Setting Up the Environment

3. Running a Model