A benchmarking framework for Medical Image Classification with focus on domain generalization. Provides efficient LMDB-based data pipelines, multiple model architectures (ViT, ResNet, Swin), and advanced augmentation methods (XDomainMix, PipMix).
- Multi-Dataset Support: NIH ChestX-ray14, COVIDx, VinBigData (easily extensible)
- Efficient Data Pipeline: LMDB-based storage for fast I/O
- Multiple Models: ViT, Swin Transformer, ResNet, MobileNet, EfficientNet via
timm - Training Modes: Linear Probe & Full Fine-tuning
- Augmentations: XDomainMix, PipMix for domain generalization
- Comprehensive Logging: Metrics CSV, loss plots, confusion matrix, ROC curves
- Checkpoint Management: Best/last checkpoints with resume support
git clone https://github.com/EnesDemir143/genmed-bench.git
cd genmed-bench
# Install with uv (recommended)
uv sync# Link raw data
ln -s /path/to/chest-datasets data/raw/
# Convert to LMDB
uv run python -m src.data.converters.nih_converter
uv run python -m src.data.converters.covidx_converter
uv run python -m src.data.converters.vinbigdata_converter# Basic training
uv run python train.py \
--model resnet50 \
--dataset nih \
--mode linear_probe \
--epochs 50
# Full example with all options
uv run python train.py \
--model vit_small_patch16 \
--mode linear_probe \
--augmentation xdomainmix \
--dataset nih \
--batch_size 64 \
--epochs 100 \
--val_ratio 0.2 \
--multi_labelTraining creates a run folder with all artifacts:
runs/<model>_<mode>_<augmentation>_<dataset>_<timestamp>/
βββ checkpoints/
β βββ best.pth # Best validation AUC
β βββ last.pth # Last epoch
βββ plots/
β βββ loss_curve.png
β βββ roc_curve.png
β βββ confusion_matrix.png
βββ metrics.csv # Per-epoch metrics
βββ config.yaml # Saved configuration
βββ train.log # Training logs
| Argument | Default | Description |
|---|---|---|
--model |
required | Model name (e.g., resnet50, vit_small_patch16, swin_tiny) |
--dataset |
required | Dataset (nih, covidx, vinbigdata) |
--mode |
linear_probe |
Training mode (linear_probe, full_finetune) |
--augmentation |
none |
Augmentation (none, xdomainmix, pipmix) |
--epochs |
from config | Number of epochs |
--batch_size |
from config | Batch size |
--lr |
from config | Learning rate |
--val_ratio |
0.2 |
Validation split ratio |
--multi_label |
False |
Multi-label classification |
--early_stopping |
0 |
Early stopping patience (0=disabled) |
--retrain |
None |
Resume from run folder name |
| Category | Models |
|---|---|
| Vision Transformers | vit_small_patch16, vit_base_patch16, deit_small_patch16 |
| Swin Transformer | swin_tiny_patch4, swin_small_patch4 |
| CNNs | resnet50, resnet101, efficientnet_b0, mobilenetv3_small |
| ConvNeXt | convnext_tiny, convnext_small |
All models are loaded from timm with ImageNet pretrained weights.
| Dataset | Images | Classes | Type |
|---|---|---|---|
| NIH ChestX-ray14 | ~112K | 14 | Multi-label |
| COVIDx | ~30K | 3 | Multi-class |
| VinBigData | ~18K | 2 | Binary |
See docs/ADDING_NEW_DATASET.md for step-by-step guide.
| Guide | Description |
|---|---|
| Adding New Datasets | How to add a new medical imaging dataset |
| Adding Models & Augmentations | How to add new models and augmentation methods |
| Configuration | Complete reference for config.yaml and models.yaml |
| Training Tips | Best practices and hyperparameter recommendations |
| Reproducibility | How to ensure reproducible experiments |
| DVC Guide | Data version control with Google Drive |
genmed-bench/
βββ configs/
β βββ config.yaml # Data paths, preprocessing
β βββ models.yaml # Model-specific hyperparameters
βββ data/
β βββ raw/ # Original datasets (symlinks)
β βββ processed/ # LMDB databases
β βββ splits/ # Train/val splits (parquet)
βββ src/
β βββ data/
β β βββ augmentation/ # XDomainMix, PipMix
β β βββ converters/ # Raw β LMDB converters
β β βββ dataset/ # Dataset classes
β βββ models/
β β βββ backbone.py # Backbone loader
β β βββ classifier.py # Classification head
β βββ train/
β β βββ trainer_base.py
β β βββ trainer_sup.py
β β βββ experiment_logger.py
β βββ utils/
β βββ metrics.py # AUC, F1, Confusion Matrix
β βββ seed.py # Reproducibility
βββ scripts/ # Data preparation scripts
βββ runs/ # Training outputs
βββ train.py # Main entry point
# Resume from run folder
uv run python train.py --retrain vit_small_patch16_linear_probe_baseline_nih_20260128_123456This automatically loads:
config.yamlfrom the run foldercheckpoints/last.pth- Continues logging to existing files
- PyTorch 2.0+ with MPS/CUDA support
- timm - Pre-trained models
- LMDB - Fast data storage
- scikit-learn - Metrics
Install all with uv sync.
MIT License
- NIH Clinical Center for ChestX-ray14
- COVID-Net for COVIDx
- VinBigData for VinDr-CXR