Skip to content

georgemilosh/closure

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

309 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

closure

closure architecture banner

closure is a machine learning framework for fluid closure modeling on ECsim and iPiC3D data.

The training stack is now based on PyTorch Lightning.

Highlights

  • Lightning-native training with clear separation between model and data logic.
  • YAML-driven experiments through LightningCLI.
  • Built-in callbacks for timing and memory monitoring.
  • Evaluation and plotting helpers compatible with the new module/datamodule API.

Core Components

  • closure/module.py: ClosureLitModule (lightning.LightningModule)
  • closure/datamodule.py: ClosureDataModule (lightning.LightningDataModule)
  • closure/models.py: network architectures (MLP, FCNN, ResNet, CNet)
  • closure/cli.py: CLI entry point (closure-train)
  • closure/eval_cli.py: run evaluation CLI (closure-eval)
  • closure/callbacks.py: MemoryMonitorCallback, TimingCallback, TorchScriptCheckpointExportCallback
  • closure/evaluation.py: post-training metrics and prediction transforms
  • closure/visualization.py: prediction vs ground-truth plotting

Installation

Basic Installation

pip install -e .

This installs the core framework with PyTorch, PyTorch Lightning, and essential utilities.

Optional Dependencies

We provide several optional extras for different use cases:

Hyperparameter Optimization (Optuna)

For hyperparameter search with Optuna, install the hp extra:

pip install -e ".[hp]"

Includes: optuna, optuna-integration, scikit-learn, plotly, nbformat

Jupyter Notebooks

For interactive notebook development:

pip install -e ".[notebook]"

Includes: jupyter, ipykernel, notebook, ipywidgets

Combined Installation (HP + Notebooks)

pip install -e ".[hp,notebook]"

Development

For development, testing, and linting:

pip install -e ".[dev]"

Includes: pytest, pytest-cov, ruff, pre-commit

GPU/CUDA Support

The package includes PyTorch, torchvision, and torchaudio but defaults to CPU builds. To enable GPU support, force-reinstall the PyTorch packages from the appropriate CUDA index (required because pip will otherwise skip the reinstall if versions match):

CUDA 12.4 (Recommended for driver ≥ 525.60):

pip install torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/cu124

CUDA 12.1:

pip install torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/cu121

CPU-only (no GPU):

pip install torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/cpu

Note: Check your NVIDIA driver version with nvidia-smi. The driver's CUDA version must be ≥ the toolkit version. For example, driver CUDA 12.8 supports cu124 but not cu130.

Verify GPU support after installation:

import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"Device count: {torch.cuda.device_count()}")

Recommended Installation for Hyperparameter Sweep Workflows

If you want to use the Optuna hyperparameter sweep functionality with GPU acceleration:

# Install core + hyperparameter optimization + notebooks
pip install -e ".[hp,notebook]"

# Then force-reinstall GPU-enabled PyTorch for your platform (e.g., CUDA 12.4)
pip install torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/cu124

Quick Start with Requirements Files

We provide pre-made requirements files for common workflows:

Core only (CPU):

pip install -r requirements.txt

Hyperparameter optimization (Optuna + analysis):

pip install -r requirements-hp.txt

Development and testing:

pip install -r requirements-dev.txt

GPU support with CUDA 12.4:

pip install -r requirements.txt
pip install torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/cu124

Full stack (HP + Notebooks + Dev — matches closure-test env):

pip install -r requirements-dev.txt

For GPU support, force-reinstall PyTorch from the appropriate CUDA index:

pip install torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/cu124

See requirements-gpu.txt for detailed instructions on GPU installation for different CUDA versions.

Verifying Installation

Test that everything is installed correctly:

# Test core imports
python -c "import closure; import lightning; import torch; print('✅ Core packages OK')"

# Test optional imports (if installed with [hp])
python -c "import optuna; import plotly; import sklearn; print('✅ HP packages OK')"

# Test notebook imports (if installed with [notebook])
python -c "import jupyter; import ipykernel; print('✅ Notebook packages OK')"

# Test GPU (if CUDA enabled)
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'Device count: {torch.cuda.device_count()}')"

# Test CLI
closure-train --help
closure-eval --help
closure-diagnostics --help

# Test Optuna sweep (hyperparameter optimization)
python examples/optuna/harris_optuna_sweep.py --help

Quick Start (Python API)

import lightning as L

from closure.datamodule import ClosureDataModule
from closure.models import MLP
from closure.module import ClosureLitModule

network = MLP(feature_dims=[10, 64, 32, 6], activations=["Tanh", "ReLU", None])

module = ClosureLitModule(
    network=network,
    criterion="MSELoss",
    optimizer="Adam",
    lr=5e-4,
    scheduler="ReduceLROnPlateau",
)

datamodule = ClosureDataModule(
    data_folder="/path/to/data",
    norm_folder="/path/to/norm",
    train_samples_file="/path/to/train.csv",
    val_samples_file="/path/to/val.csv",
    test_samples_file="/path/to/test.csv",
    batch_size=512,
    flatten=True,
    read_features_targets_kwargs={
        "request_features": ["rho_e", "Bx", "By", "Bz", "Vx_e", "Vy_e", "Vz_e", "Ex", "Ey", "Ez"],
        "request_targets": ["Pxx_e", "Pyy_e", "Pzz_e", "Pxy_e", "Pxz_e", "Pyz_e"],
    },
)

trainer = L.Trainer(max_epochs=50, accelerator="auto")
trainer.fit(module, datamodule=datamodule)
trainer.test(module, datamodule=datamodule)

Quick Start (CLI)

Use provided YAML configs under configs/.

closure-train fit --config configs/default.yaml

Override parameters directly from CLI:

closure-train fit \
  --config configs/default.yaml \
  --model.network.class_path=closure.models.ResNet \
  --model.lr=1e-3 \
  --data.batch_size=256

Evaluate a trained run from CLI

closure-eval reproduces the common notebook evaluation workflow using RunLoader and writes artifacts directly into the selected run/version folder (or a custom output directory):

  • prints config summary, history tail, best epoch, and test metrics to terminal
  • writes per-channel test metrics CSV
  • saves history and channel-metrics figures to img/
  • optionally renders per-target field plots (real/predict/error)

Quick tutorial:

# 1. Activate the project environment.
# For the HPC module-based workflow:
source activate_hpc.sh

# 2. Run evaluation on one saved run.
closure-eval --run-dir models/Lightning/iPiC3D-nathan5-12/test/run_1

# 3. Restrict to a few targets or samples when iterating on plots.
closure-eval \
  --run-dir models/Lightning/iPiC3D-nathan5-12/test/run_1 \
  --targets Pxx_e Pyy_e Pzz_e \
  --max-plots 3

# 4. Reuse the trained model on a different test split.
closure-eval \
  --run-dir models/Lightning/iPiC3D-nathan5-12/test/run_1 \
  --test-samples-file ./splits/iPiC3D-nathan5-12/5-10-12/RunID_1.csv

# 5. Export only scalar reports when you do not want images.
closure-eval \
  --run-dir models/Lightning/iPiC3D-nathan5-12/test/run_1 \
  --skip-field-plots

Useful options:

  • --run-dir or --version-dir: evaluate one explicit saved run
  • --run-dir <parent_folder>: evaluate all direct child run folders in batch mode (unfinished runs are skipped)
  • --log-root: automatically pick the latest run_* or version_* folder
  • --targets: restrict field plots to selected target names
  • --max-plots: limit how many time slices are rendered
  • --test-samples-file: override the test set without editing config files
  • --output-dir: write CSV/figures somewhere else
  • --skip-history-plot, --skip-metrics-plot, --skip-field-plots: export only what you need

Examples:

# Evaluate one explicit run/version directory
closure-eval --run-dir models/Lightning/iPiC3D-nathan5-12/test/run_001

# Evaluate all runs under a parent folder (skips unfinished runs)
closure-eval --run-dir models/Lightning/iPiC3D-nathan5-12/ablations_long1000_serial/runs

# Or pick the latest run_*/version_* under a root directory
closure-eval --log-root models/Lightning/iPiC3D-nathan5-12/test

# Override the test split without editing config.yaml
closure-eval \
  --run-dir models/Lightning/iPiC3D-nathan5-12/test/run_001 \
  --test-samples-file ./splits/iPiC3D-nathan5-12/5-10-12/RunID_1.csv

# Only export metrics/history (no field plots)
closure-eval \
  --run-dir models/Lightning/iPiC3D-nathan5-12/test/run_001 \
  --skip-field-plots

Default output layout:

  • <run_or_version_dir>/test_metrics.csv
  • <run_or_version_dir>/img/history.png
  • <run_or_version_dir>/img/channel_metrics.png
  • <run_or_version_dir>/img/<target>_cycle<CYCLE>_{real,predict,error}.png
  • <run_or_version_dir>/img/<target>_cycles<FIRST-LAST>_summary.png

Field Diagnostics CLI

closure-diagnostics exports notebook-style field figures and CSV diagnostics without copying plotting code into ad-hoc notebooks.

Two backends, selected with --backend:

  • ecsim (default) — iPiC3D runs, e.g. Le2DHGEM_RunID_0_f2.
  • menura — Menura runs, e.g. R0/iso_GEM_1e-2_Jze.5_r0_1024x1024 (set menura_analysis_dir in paths.yaml).

Conventions used by every example below, chosen to match fullres.ipynb:

  • Normalization --normalization alfven-sample --sample-nb-factor 1code2alfven with b0x = -Bx[0,0,0], nb = rho_i.max() (what the notebook's active line uses; reproduces its figures). Add --no-density-norm to keep density in code units while still casting B and the other fields/axes/time into Alfvén units. Other --normalization modes:
    • none (default) — raw code units, no code2alfven. Works for both backends.
    • alfven-infer — infer b0x/nb from the run's .inp (B0x, rhoINIT[0]), matching the notebook's commented code2alfven(..., experiment=...) line. Works for ECsim (e.g. RunID_0.inpb0x=0.0249, nb=0.969); menura has no such .inp, so it raises FileNotFoundError — use alfven-sample or alfven-explicit --b0x <v> --nb <v> there. Note alfven-infer uses rhoINIT[0] (background, 0.969) while alfven-sample uses rho_i.max() (sheet, 0.23), so the two give different density normalizations.
  • ECsim species --choose-species e,i,e,i. --choose-species maps positionally to particle populations (index imoments/species_i; shared labels are summed). The Le2DHGEM runs have four populations, so e,i,e,i sums sheet + background. The default e,i reads only species_0/1 and drops the background, so P_e/rho_e fall to ~0 in the lobes (and the reconnection normalization, which needs rho_e at the corner cell, breaks). Menura has 2 species and keeps the default e,i.
  • Cropping to one current sheet --choose-x 0,512 --choose-y 0,256. These are double-Harris (Le2DH) runs: the full domain holds two current sheets, so a y-cut crosses both (two Bx reversals / two pressure islands). The notebook crops to the lower half in y to analyze a single sheet — do the same. For menura also add --menura-scale-ranges, which scales these 512-cell base ranges up to the run resolution; for ECsim they are plain index ranges.

In the exercises below you may use --normalization alfven-sample --sample-nb-factor 1 for ECsim or use --normalization alfven-explicit, while for menura it can be avoided all together assuming that it was run with $B0_x = 1$

# === Field panels ===========================================================
closure-diagnostics fields Le2DHGEM_RunID_5_f2 \
  --files-path /volume1/scratch/share_dir/iPiC3D-nathan \
  --fields Az,Ey,Ez,rho_e,rho_i,Jz_e,Jz_i,Bx,By,Bz \
  --processed --normalization alfven-infer --sample-nb-factor 1 \
  --choose-species e,i,e,i --choose-x 0,512 --choose-y 0,256 --choose-times 0 \
  --output diagnostics/R5_fields.png

closure-diagnostics fields R0/iso_GEM_1e-2_Jze.5_r0_1024x1024 --backend menura \
  --files-path /volume1/scratch/georgem/menura/runs/GEM/hortense/nathan5-12 \
  --fields Az,Ey,Ez,rho_e,rho_i,Jz_e,Jx_i,Jy_i,Jz_i,Bx,By,Bz \
  --processed --choose-times 12 \
  --choose-x 0,512 --choose-y 0,256 --menura-scale-ranges \
  --output diagnostics/R0_fields.png

# === Profiles (1D cuts) =====================================================
# Mirrors profile_fns: cut along y at x = nx//2 (omit --cut-index), t = 0.
# Pass several experiments to either backend to compare them (one `run` per
# experiment in the CSV). `profiles` ALWAYS overwrites --output-csv (no append).
closure-diagnostics profiles Le2DHGEM_RunID_0_f2 Le2DHGEM_RunID_5_f2 \
  --files-path /volume1/scratch/share_dir/iPiC3D-nathan \
  --fields P_e,P_i,rho_e,rho_i,Jz_e,Jz_i,Bx,By \
  --projection y --choose-times 0 --processed \
  --normalization alfven-infer --sample-nb-factor 1 --choose-species e,i,e,i \
  --choose-x 0,512 --choose-y 0,256 \
  --output-csv diagnostics/profiles_ecsim.csv

# Several menura runs at once: list each Rxx/<model> experiment.
closure-diagnostics profiles \
  R0/iso_GEM_1e-2_Jze.5_r0_1024x1024 R5/iso_GEM_1e-2_Jze.5_r0_1024x1024 \
  --backend menura \
  --files-path /volume1/scratch/georgem/menura/runs/GEM/hortense/nathan5-12 \
  --fields P_e,P_i,rho_e,rho_i,Jz_e,Jz_i,Bx,By \
  --projection y --choose-times 0 --processed \
  --choose-x 0,512 --choose-y 0,256 --menura-scale-ranges \
  --output-csv diagnostics/profiles_menura.csv

# A profiles CSV holds every field, so a bare overlay draws them all on one axes.
# Pick one field with --field (= one notebook cell); pass several CSVs to compare
# backends/runs (--group-by run -> one line each). --select COL=VAL filters any
# column, e.g. --select run=Le2DHGEM_RunID_0_f2. Both accept comma lists.
# Axis labels default to what is plotted (here y-axis "P_e", x-axis "y" from the
# projection); override with --xlabel/--ylabel. Series use plotter.interactive
# styling (cycling color/dash, width ramps down + alpha ramps up across series).
closure-diagnostics overlay \
  diagnostics/profiles_ecsim.csv diagnostics/profiles_menura.csv \
  --field P_e --x coord --y value --group-by run \
  --output diagnostics/profile_P_e.png

# === Reconnection rate ======================================================
# Tracks X/O points in Az and exports recon_flux/recon_rate. X/O defaults already
# match the notebook (grad_tol 1e-6, merge_tol 1e-3); pass --az-sigma 4.
# --recon-normalization notebook adds time_norm and
#   recon_rate_norm = -recon_rate * sqrt(-rho_e0 * 4pi) / Bx0**2
# (the sign flip keeps the growth phase positive so a log axis doesn't drop out).
# `reconnection` APPENDS to --output-csv by default; use --csv-mode replace.
closure-diagnostics reconnection Le2DHGEM_RunID_0_f2 \
  --files-path /volume1/scratch/share_dir/iPiC3D-nathan \
  --choose-times all --processed \
  --normalization alfven-infer --sample-nb-factor 1 --choose-species e,i,e,i \
  --choose-x 0,512 --choose-y 0,256 \
  --az-sigma 4 --recon-normalization notebook --csv-mode replace \
  --output-csv diagnostics/reconnection_ecsim.csv

closure-diagnostics reconnection R0/iso_GEM_1e-2_Jze.5_r0_1024x1024 --backend menura \
  --files-path /volume1/scratch/georgem/menura/runs/GEM/hortense/nathan5-12 \
  --choose-times all --processed \
  --choose-x 0,512 --choose-y 0,256 --menura-scale-ranges \
  --az-sigma 4 --recon-normalization notebook --csv-mode replace \
  --output-csv diagnostics/reconnection_menura.csv

# Recursive Menura discovery (--backend menura): if an experiment argument is a
# PARENT folder rather than a single run, every Menura run beneath it (any folder
# holding products/B_it*_rank_0_0.npy) is discovered and added to the CSV, one
# `run` per discovered run (labeled relative to --files-path, e.g. R5/new_FCNN_00172).
# A fully-specified run is used as-is. Example: pass `R5` to sweep all 60 runs of
# a campaign in one call (use --csv-mode replace so reruns don't append duplicates):
closure-diagnostics reconnection R5 --backend menura \
  --files-path /dodrio/scratch/projects/2026_018/george/menura/runs/stability_campaign2 \
  --choose-times all --az-sigma 4 --recon-normalization notebook --csv-mode replace \
  --output-csv diagnostics/reconnection_menura.csv
# Pass several parents to combine campaigns: ... reconnection R0 R5 R7 R12 --backend menura ...

# Overlay the NORMALIZED rate on a log axis (plot the *_norm columns, not the raw
# recon_rate/time, which are mostly negative and vanish under --logy).
closure-diagnostics overlay \
  diagnostics/reconnection_ecsim.csv diagnostics/reconnection_menura.csv \
  --x time_norm --y recon_rate_norm --group-by run --logy \
  --output diagnostics/reconnection_overlay.png

# === One-shot profile helpers ===============================================
# Export the 8 profile fields and emit one PNG per field (= one notebook cell
# each). Each script overwrites only its own dir; run both, then overlay above.
scripts/profiles_ecsim.sh  diagnostics/profiles_ecsim    # iPiC3D Le2DHGEM_RunID_0_f2
scripts/profiles_menura.sh diagnostics/profiles_menura   # menura R0/iso_GEM_...
# Add experiments to overlay several runs per field. ECsim takes full names;
# menura takes bare run folders (expanded to RUN/$MODEL, override with MODEL=...):
#   scripts/profiles_ecsim.sh  diagnostics/cmp Le2DHGEM_RunID_0_f2 Le2DHGEM_RunID_5_f2
#   scripts/profiles_menura.sh diagnostics/cmp R0 R5 R7

Logging and Artifacts

Lightning logging is used by default (CSV logger in configs).

closure.log is written alongside the Lightning CSV logger outputs. If you set --trainer.logger.init_args.name and --trainer.logger.init_args.version, the log file goes into that exact run directory. If you omit version, Lightning's auto-created version_* directory is used, so closure.log lives inside the same per-run folder as metrics.csv.

Typical outputs include:

  • lightning_logs/ or configured logger directory
  • metrics.csv
  • checkpoints from ModelCheckpoint
  • matching TorchScript exports beside each checkpoint, e.g. checkpoints/best-epoch=3-val_loss=0.1234.pt
  • normalized feature/target statistics in norm_folder

Legacy files like loss_dict.pkl are no longer used.

Production Setup

This section covers everything needed to go from raw simulation data to production training runs.

1. paths.yaml

Create a paths.yaml in the repository root (copy from paths.yaml.example):

work_dir: ./models       # training outputs, checkpoints, normalization stats
data_dir: /scratch/data   # root of your simulation data

Relative paths in paths.yaml are resolved against the directory that contains the file. All config parameters that accept paths use a three-tier resolution strategy (implemented by ClosureDataModule._resolve_path):

Path form Example Resolution
Absolute /scratch/data/Harris Used as-is
Dot-relative (./, ../) ./data/train.csv Resolved against the current working directory
Bare identifier ecsim/Harris/Le Joined with the corresponding paths.yaml root (data_dir or work_dir)

2. Data directory structure

Simulation data is stored as HDF5 or pickle files under data_dir, organized by experiment. Each file contains a single simulation time step:

data_dir/
  ecsim/Harris/Le/
    T2D14_filter2/
      T2D-Fields_00500.h5.pkl
      T2D-Fields_01000.h5.pkl
      ...
    T2D15_filter2/
      T2D-Fields_00500.h5.pkl
      ...

The files are read by closure.read_pic.read_features_targets, which extracts the requested field channels (B, E, rho, J, P, etc.) and species.

3. Creating train/val/test splits

Use scripts/datasplit.py to build CSV split files. Each CSV has a single filenames column listing the data file paths:

# Training set from two simulation folders (time steps 5000–10000)
python scripts/datasplit.py \
    folders=[T2D14_filter2,T2D15_filter2] \
    name=train.csv \
    root_folder=/scratch/data/ecsim/Harris/Le/ \
    min_number=5000 max_number=10000

# Validation set from a held-out folder
python scripts/datasplit.py \
    folders=[T2D16_filter2] \
    name=val.csv \
    root_folder=/scratch/data/ecsim/Harris/Le/

# Test set
python scripts/datasplit.py \
    folders=[T2D17_filter2] \
    name=test.csv \
    root_folder=/scratch/data/ecsim/Harris/Le/

Arguments:

Argument Required Description
folders yes Folder names or paths to search, e.g. [a,b,c]
name yes Output CSV filename
root_folder no Root prepended to each folder path
pattern no Glob pattern (default: T2D-Fields_*)
min_number no Exclude files with time-step number below this
max_number no Exclude files with time-step number above this

4. Writing a YAML config

Three annotated templates are provided under configs/:

Template Architecture Data shape Use case
configs/default.yaml FCNN 2-D patches CNN-based closure
configs/mlp.yaml MLP Flattened pixels Pixel-wise baseline
configs/resnet.yaml ResNet 2-D patches Deep residual closure

Copy one and customize. Key sections explained:

data:
  data_folder: ecsim/Harris/Le           # bare → joined with data_dir
  norm_folder: Harris/Le/my_experiment   # bare → joined with work_dir
  train_samples_file: ./splits/train.csv  # ./ → CWD-relative
  val_samples_file: ./splits/val.csv
  test_samples_file: ./splits/test.csv
  flatten: false                          # true for MLP, false for CNN/ResNet
  patch_dim: [32, 32]                     # random crop size (CNN/ResNet only)
  scaler_features: true                   # enable per-channel standardization
  scaler_targets: true
  prescaler_features:                     # per-channel transforms before standardization
    - arcsinh    # rho_e
    - null       # Bx  (no prescaling)
    - ...
  prescaler_targets:
    - log        # Pxx_e (positive-definite diagonal)
    - arcsinh    # Pxy_e (signed off-diagonal)
    - ...
  read_features_targets_kwargs:
    fields_to_read:                       # which HDF5 field groups to load
      B: true
      E: true
      rho: true
      J: true
      P: true
      PI: true
    request_features:                     # specific channels extracted from fields
      - rho_e
      - Bx
      - By
      - Bz
      - Jx_e
      - Jy_e
      - Jz_e
      - Vx_e
      - Vy_e
      - Vz_e
    request_targets:
      - Pxx_e
      - Pyy_e
      - Pzz_e
      - Pxy_e
      - Pxz_e
      - Pyz_e
    choose_species: ['e', null]           # electron species for multi-species data
    choose_x: [0, 512]                    # spatial domain crop
    choose_y: [175, 325]

Prescaler guidance:

  • log — for strictly positive quantities (diagonal pressure)
  • arcsinh — for quantities that can be negative or span orders of magnitude
  • null — no prescaling

5. Launching training

Single GPU:

closure-train fit --config my_config.yaml

Multi-GPU (DDP):

closure-train fit --config my_config.yaml \
    --trainer.devices=4 \
    --trainer.strategy=ddp

Slurm cluster:

#!/bin/bash
#SBATCH --nodes=2
#SBATCH --gres=gpu:4
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=12

srun closure-train fit --config my_config.yaml

6. Scaffolding experiment sweeps

For systematic architecture/feature-set sweeps, use scripts/scaffold_harris_experiments.py. It generates a directory tree of YAML configs and Slurm run.sh scripts:

python scripts/scaffold_harris_experiments.py \
    --output-root models/Harris/Le/Le2GEM15ppc_lightning \
    --data-folder ecsim/Harris/Le \
    --split-root ecsim/sampling/ecsim/Harris/Le/Le2GEM15ppc \
    --max-epochs 500 --devices 4

This creates:

Le2GEM15ppc_lightning/
  default/P/          4lrs_es500.yaml  5lrs_es500.yaml  ...  run.sh
  default/divP/       4lrs.yaml        5lrs.yaml        ...  run.sh
  noE/P/              ...
  noJ/P/              ...
  noJnoE/P/           ...

Each variant (default, noE, noJ, noJnoE) uses a different feature subset. Each task (P, divP) uses different targets and prescalers. The run.sh files are ready to submit with sbatch.

7. Evaluation and artifact export

After training, load a checkpoint and evaluate:

from closure.module import ClosureLitModule
from closure.evaluation import evaluate_loss, evaluate_regression_metrics, transform_targets

module = ClosureLitModule.load_from_checkpoint("best.ckpt", network=network)
ground_truth, prediction = transform_targets(module, test_dataset, ...)

# Per-channel MSE
evaluate_loss(test_dataset, ground_truth, prediction, "MSELoss", verbose=True)

# Regression metrics table (R², RMSE, Pearson r, etc.)
metrics_df = evaluate_regression_metrics(test_dataset, ground_truth, prediction)

Export deployable artifacts:

import torch

# Inference bundle (state dict + normalization stats + metadata)
torch.save({"state_dict": ..., "features_mean": ..., ...}, "inference_bundle.pt")

# TorchScript for deployment
scripted = torch.jit.script(network)
scripted.save("torchscript.pt")

See examples/tutorials/tuto_train.py for a complete end-to-end example including evaluation, visualization, and artifact export.

8. Production ablation matrix: launch → per-channel eval → figures

The iPiC3D production ablation studies (a matrix of feature set × target × architecture, for both CNN and MLP) are driven by helper scripts under scripts/scaling_jobs/. Each study lives in its own folder under models/Lightning/iPiC3D-nathan5-12/ with a README.md pinning its exact splits and configs.

Launchsubmit_prod_ablations.sh submits one atomic single-GPU SLURM job per (model, feature, target, arch) cell so the matrix runs in parallel. All options are environment variables (MODELS, FEATURES, TARGETS, ARCH_LIST, SPLIT_TARGET, SPLIT_ARCH, MAX_EPOCHS, CONFIG_PATH, SAVE_DIR); always preview with DRY=1 first:

DRY=1 MODELS="cnn mlp" ARCH_LIST="baseline shallower deeper" SPLIT_TARGET=1 SPLIT_ARCH=1 \
  bash scripts/scaling_jobs/submit_prod_ablations.sh

Evaluate + ploteval_test_ablations_interactive.sh (a thin wrapper around eval_test_ablations.py) loads each cell's best checkpoint via RunLoader (avoiding the closure-train test Lightning enums bug) and computes per-channel regression metrics on both the test and validation splits (EVAL_SPLITS="test val" by default), for both networks. Run it on a gpu_rome_a100 node:

BASE=models/Lightning/iPiC3D-nathan5-12/<study> \
TEST_SPLIT=./splits/iPiC3D-nathan5-12/<test_split>.csv TEST_STRIDE=1 \
  bash scripts/scaling_jobs/eval_test_ablations_interactive.sh

Outputs (all under <BASE>):

  • <cell>/{test,val}_metrics.csv — per-channel metrics, one file per cell per split
  • {test,val}_ranking.csv — aggregate (mean over channels), one row per cell
  • channel_metrics.csv — combined per-channel table (all cells × both splits)
  • figs/fig_channel_r2_<model>_<split>_<target>.png — per-channel R² heatmaps (target channel × feature × arch). Generated by the eval itself — no per-folder make_figures.py needed; set SKIP_FIGURES=1 to skip.

Memory: run.metrics() holds all predictions, so {TEST,VAL}_STRIDE subsample the files to bound RAM (use =1 only with a large --mem). EVAL_SPLITS=test restores the old test-only behavior; the val split is auto-read from each cell's config.yaml (override with VAL_SPLIT / VAL_STRIDE).

Examples

  • examples/tutorials/tuto_train.py: self-contained training tutorial using bundled fixture data
  • examples/tuto_train.ipynb: real-data tutorial (Lightning update section added at top)
  • examples/tuto_train_synthetic.ipynb: synthetic-data tutorial (Lightning update section added at top)
  • examples/optuna/optuna_sweep.py: Optuna sweep example with Lightning
  • examples/optuna/harris_optuna_sweep.py: Harris Le2GEM15ppc Optuna sweep for FCNN experiments

Notes on Migration

  • The old Trainer, PyNet, and closure.trainers module were removed.
  • Use ClosureLitModule + ClosureDataModule for programmatic workflows.
  • Use closure-train for config-driven workflows.

Citing & License

  • Author: George Miloshevich
  • License: MIT License
  • Projects: STRIDE, HELIOSKILL

If you use closure in your research, please cite:

@article{miloshevich2026electron,
  title = {Electron Neural Closure for Turbulent Magnetosheath Simulations: {{Energy}} Channels},
  author = {Miloshevich, G. and Vranckx, L. and de Oliveira Lopes, F. N. and Dazzi, P. and Arrò, G. and Lapenta, G.},
  year = {2026},
  journal = {Physics of Plasmas},
  volume = {33},
  number = {1},
  pages = {012901},
  issn = {1070-664X},
  doi = {10.1063/5.0300009},
}

Further Reading


closure is designed for flexibility, reproducibility, and ease of use in scientific ML workflows. Contributions and feedback are welcome!

About

Discovering fluid closures of kinetics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors