closure is a machine learning framework for fluid closure modeling on ECsim and iPiC3D data.
The training stack is now based on PyTorch Lightning.
- Lightning-native training with clear separation between model and data logic.
- YAML-driven experiments through LightningCLI.
- Built-in callbacks for timing and memory monitoring.
- Evaluation and plotting helpers compatible with the new module/datamodule API.
closure/module.py:ClosureLitModule(lightning.LightningModule)closure/datamodule.py:ClosureDataModule(lightning.LightningDataModule)closure/models.py: network architectures (MLP,FCNN,ResNet,CNet)closure/cli.py: CLI entry point (closure-train)closure/eval_cli.py: run evaluation CLI (closure-eval)closure/callbacks.py:MemoryMonitorCallback,TimingCallback,TorchScriptCheckpointExportCallbackclosure/evaluation.py: post-training metrics and prediction transformsclosure/visualization.py: prediction vs ground-truth plotting
pip install -e .This installs the core framework with PyTorch, PyTorch Lightning, and essential utilities.
We provide several optional extras for different use cases:
Hyperparameter Optimization (Optuna)
For hyperparameter search with Optuna, install the hp extra:
pip install -e ".[hp]"Includes: optuna, optuna-integration, scikit-learn, plotly, nbformat
Jupyter Notebooks
For interactive notebook development:
pip install -e ".[notebook]"Includes: jupyter, ipykernel, notebook, ipywidgets
Combined Installation (HP + Notebooks)
pip install -e ".[hp,notebook]"Development
For development, testing, and linting:
pip install -e ".[dev]"Includes: pytest, pytest-cov, ruff, pre-commit
The package includes PyTorch, torchvision, and torchaudio but defaults to CPU builds. To enable GPU support, force-reinstall the PyTorch packages from the appropriate CUDA index (required because pip will otherwise skip the reinstall if versions match):
CUDA 12.4 (Recommended for driver ≥ 525.60):
pip install torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/cu124CUDA 12.1:
pip install torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/cu121CPU-only (no GPU):
pip install torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/cpuNote: Check your NVIDIA driver version with
nvidia-smi. The driver's CUDA version must be ≥ the toolkit version. For example, driver CUDA 12.8 supports cu124 but not cu130.
Verify GPU support after installation:
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"Device count: {torch.cuda.device_count()}")If you want to use the Optuna hyperparameter sweep functionality with GPU acceleration:
# Install core + hyperparameter optimization + notebooks
pip install -e ".[hp,notebook]"
# Then force-reinstall GPU-enabled PyTorch for your platform (e.g., CUDA 12.4)
pip install torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/cu124We provide pre-made requirements files for common workflows:
Core only (CPU):
pip install -r requirements.txtHyperparameter optimization (Optuna + analysis):
pip install -r requirements-hp.txtDevelopment and testing:
pip install -r requirements-dev.txtGPU support with CUDA 12.4:
pip install -r requirements.txt
pip install torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/cu124Full stack (HP + Notebooks + Dev — matches closure-test env):
pip install -r requirements-dev.txtFor GPU support, force-reinstall PyTorch from the appropriate CUDA index:
pip install torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/cu124See requirements-gpu.txt for detailed instructions on GPU installation for different CUDA versions.
Test that everything is installed correctly:
# Test core imports
python -c "import closure; import lightning; import torch; print('✅ Core packages OK')"
# Test optional imports (if installed with [hp])
python -c "import optuna; import plotly; import sklearn; print('✅ HP packages OK')"
# Test notebook imports (if installed with [notebook])
python -c "import jupyter; import ipykernel; print('✅ Notebook packages OK')"
# Test GPU (if CUDA enabled)
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'Device count: {torch.cuda.device_count()}')"
# Test CLI
closure-train --help
closure-eval --help
closure-diagnostics --help
# Test Optuna sweep (hyperparameter optimization)
python examples/optuna/harris_optuna_sweep.py --helpimport lightning as L
from closure.datamodule import ClosureDataModule
from closure.models import MLP
from closure.module import ClosureLitModule
network = MLP(feature_dims=[10, 64, 32, 6], activations=["Tanh", "ReLU", None])
module = ClosureLitModule(
network=network,
criterion="MSELoss",
optimizer="Adam",
lr=5e-4,
scheduler="ReduceLROnPlateau",
)
datamodule = ClosureDataModule(
data_folder="/path/to/data",
norm_folder="/path/to/norm",
train_samples_file="/path/to/train.csv",
val_samples_file="/path/to/val.csv",
test_samples_file="/path/to/test.csv",
batch_size=512,
flatten=True,
read_features_targets_kwargs={
"request_features": ["rho_e", "Bx", "By", "Bz", "Vx_e", "Vy_e", "Vz_e", "Ex", "Ey", "Ez"],
"request_targets": ["Pxx_e", "Pyy_e", "Pzz_e", "Pxy_e", "Pxz_e", "Pyz_e"],
},
)
trainer = L.Trainer(max_epochs=50, accelerator="auto")
trainer.fit(module, datamodule=datamodule)
trainer.test(module, datamodule=datamodule)Use provided YAML configs under configs/.
closure-train fit --config configs/default.yamlOverride parameters directly from CLI:
closure-train fit \
--config configs/default.yaml \
--model.network.class_path=closure.models.ResNet \
--model.lr=1e-3 \
--data.batch_size=256closure-eval reproduces the common notebook evaluation workflow using
RunLoader and writes artifacts directly into the selected run/version folder
(or a custom output directory):
- prints config summary, history tail, best epoch, and test metrics to terminal
- writes per-channel test metrics CSV
- saves history and channel-metrics figures to
img/ - optionally renders per-target field plots (real/predict/error)
Quick tutorial:
# 1. Activate the project environment.
# For the HPC module-based workflow:
source activate_hpc.sh
# 2. Run evaluation on one saved run.
closure-eval --run-dir models/Lightning/iPiC3D-nathan5-12/test/run_1
# 3. Restrict to a few targets or samples when iterating on plots.
closure-eval \
--run-dir models/Lightning/iPiC3D-nathan5-12/test/run_1 \
--targets Pxx_e Pyy_e Pzz_e \
--max-plots 3
# 4. Reuse the trained model on a different test split.
closure-eval \
--run-dir models/Lightning/iPiC3D-nathan5-12/test/run_1 \
--test-samples-file ./splits/iPiC3D-nathan5-12/5-10-12/RunID_1.csv
# 5. Export only scalar reports when you do not want images.
closure-eval \
--run-dir models/Lightning/iPiC3D-nathan5-12/test/run_1 \
--skip-field-plotsUseful options:
--run-diror--version-dir: evaluate one explicit saved run--run-dir <parent_folder>: evaluate all direct child run folders in batch mode (unfinished runs are skipped)--log-root: automatically pick the latestrun_*orversion_*folder--targets: restrict field plots to selected target names--max-plots: limit how many time slices are rendered--test-samples-file: override the test set without editing config files--output-dir: write CSV/figures somewhere else--skip-history-plot,--skip-metrics-plot,--skip-field-plots: export only what you need
Examples:
# Evaluate one explicit run/version directory
closure-eval --run-dir models/Lightning/iPiC3D-nathan5-12/test/run_001
# Evaluate all runs under a parent folder (skips unfinished runs)
closure-eval --run-dir models/Lightning/iPiC3D-nathan5-12/ablations_long1000_serial/runs
# Or pick the latest run_*/version_* under a root directory
closure-eval --log-root models/Lightning/iPiC3D-nathan5-12/test
# Override the test split without editing config.yaml
closure-eval \
--run-dir models/Lightning/iPiC3D-nathan5-12/test/run_001 \
--test-samples-file ./splits/iPiC3D-nathan5-12/5-10-12/RunID_1.csv
# Only export metrics/history (no field plots)
closure-eval \
--run-dir models/Lightning/iPiC3D-nathan5-12/test/run_001 \
--skip-field-plotsDefault output layout:
<run_or_version_dir>/test_metrics.csv<run_or_version_dir>/img/history.png<run_or_version_dir>/img/channel_metrics.png<run_or_version_dir>/img/<target>_cycle<CYCLE>_{real,predict,error}.png<run_or_version_dir>/img/<target>_cycles<FIRST-LAST>_summary.png
closure-diagnostics exports notebook-style field figures and CSV diagnostics
without copying plotting code into ad-hoc notebooks.
Two backends, selected with --backend:
ecsim(default) — iPiC3D runs, e.g.Le2DHGEM_RunID_0_f2.menura— Menura runs, e.g.R0/iso_GEM_1e-2_Jze.5_r0_1024x1024(setmenura_analysis_dirinpaths.yaml).
Conventions used by every example below, chosen to match fullres.ipynb:
- Normalization
--normalization alfven-sample --sample-nb-factor 1→code2alfvenwithb0x = -Bx[0,0,0],nb = rho_i.max()(what the notebook's active line uses; reproduces its figures). Add--no-density-normto keep density in code units while still casting B and the other fields/axes/time into Alfvén units. Other--normalizationmodes:none(default) — raw code units, nocode2alfven. Works for both backends.alfven-infer— inferb0x/nbfrom the run's.inp(B0x,rhoINIT[0]), matching the notebook's commentedcode2alfven(..., experiment=...)line. Works for ECsim (e.g.RunID_0.inp→b0x=0.0249,nb=0.969); menura has no such.inp, so it raisesFileNotFoundError— usealfven-sampleoralfven-explicit --b0x <v> --nb <v>there. Notealfven-inferusesrhoINIT[0](background, 0.969) whilealfven-sampleusesrho_i.max()(sheet, 0.23), so the two give different density normalizations.
- ECsim species
--choose-species e,i,e,i.--choose-speciesmaps positionally to particle populations (index i →moments/species_i; shared labels are summed). The Le2DHGEM runs have four populations, soe,i,e,isums sheet + background. The defaulte,ireads onlyspecies_0/1and drops the background, soP_e/rho_efall to ~0 in the lobes (and the reconnection normalization, which needsrho_eat the corner cell, breaks). Menura has 2 species and keeps the defaulte,i. - Cropping to one current sheet
--choose-x 0,512 --choose-y 0,256. These are double-Harris (Le2DH) runs: the full domain holds two current sheets, so a y-cut crosses both (two Bx reversals / two pressure islands). The notebook crops to the lower half in y to analyze a single sheet — do the same. For menura also add--menura-scale-ranges, which scales these 512-cell base ranges up to the run resolution; for ECsim they are plain index ranges.
In the exercises below you may use --normalization alfven-sample --sample-nb-factor 1 for ECsim or use --normalization alfven-explicit, while for menura it can be avoided all together assuming that it was run with
# === Field panels ===========================================================
closure-diagnostics fields Le2DHGEM_RunID_5_f2 \
--files-path /volume1/scratch/share_dir/iPiC3D-nathan \
--fields Az,Ey,Ez,rho_e,rho_i,Jz_e,Jz_i,Bx,By,Bz \
--processed --normalization alfven-infer --sample-nb-factor 1 \
--choose-species e,i,e,i --choose-x 0,512 --choose-y 0,256 --choose-times 0 \
--output diagnostics/R5_fields.png
closure-diagnostics fields R0/iso_GEM_1e-2_Jze.5_r0_1024x1024 --backend menura \
--files-path /volume1/scratch/georgem/menura/runs/GEM/hortense/nathan5-12 \
--fields Az,Ey,Ez,rho_e,rho_i,Jz_e,Jx_i,Jy_i,Jz_i,Bx,By,Bz \
--processed --choose-times 12 \
--choose-x 0,512 --choose-y 0,256 --menura-scale-ranges \
--output diagnostics/R0_fields.png
# === Profiles (1D cuts) =====================================================
# Mirrors profile_fns: cut along y at x = nx//2 (omit --cut-index), t = 0.
# Pass several experiments to either backend to compare them (one `run` per
# experiment in the CSV). `profiles` ALWAYS overwrites --output-csv (no append).
closure-diagnostics profiles Le2DHGEM_RunID_0_f2 Le2DHGEM_RunID_5_f2 \
--files-path /volume1/scratch/share_dir/iPiC3D-nathan \
--fields P_e,P_i,rho_e,rho_i,Jz_e,Jz_i,Bx,By \
--projection y --choose-times 0 --processed \
--normalization alfven-infer --sample-nb-factor 1 --choose-species e,i,e,i \
--choose-x 0,512 --choose-y 0,256 \
--output-csv diagnostics/profiles_ecsim.csv
# Several menura runs at once: list each Rxx/<model> experiment.
closure-diagnostics profiles \
R0/iso_GEM_1e-2_Jze.5_r0_1024x1024 R5/iso_GEM_1e-2_Jze.5_r0_1024x1024 \
--backend menura \
--files-path /volume1/scratch/georgem/menura/runs/GEM/hortense/nathan5-12 \
--fields P_e,P_i,rho_e,rho_i,Jz_e,Jz_i,Bx,By \
--projection y --choose-times 0 --processed \
--choose-x 0,512 --choose-y 0,256 --menura-scale-ranges \
--output-csv diagnostics/profiles_menura.csv
# A profiles CSV holds every field, so a bare overlay draws them all on one axes.
# Pick one field with --field (= one notebook cell); pass several CSVs to compare
# backends/runs (--group-by run -> one line each). --select COL=VAL filters any
# column, e.g. --select run=Le2DHGEM_RunID_0_f2. Both accept comma lists.
# Axis labels default to what is plotted (here y-axis "P_e", x-axis "y" from the
# projection); override with --xlabel/--ylabel. Series use plotter.interactive
# styling (cycling color/dash, width ramps down + alpha ramps up across series).
closure-diagnostics overlay \
diagnostics/profiles_ecsim.csv diagnostics/profiles_menura.csv \
--field P_e --x coord --y value --group-by run \
--output diagnostics/profile_P_e.png
# === Reconnection rate ======================================================
# Tracks X/O points in Az and exports recon_flux/recon_rate. X/O defaults already
# match the notebook (grad_tol 1e-6, merge_tol 1e-3); pass --az-sigma 4.
# --recon-normalization notebook adds time_norm and
# recon_rate_norm = -recon_rate * sqrt(-rho_e0 * 4pi) / Bx0**2
# (the sign flip keeps the growth phase positive so a log axis doesn't drop out).
# `reconnection` APPENDS to --output-csv by default; use --csv-mode replace.
closure-diagnostics reconnection Le2DHGEM_RunID_0_f2 \
--files-path /volume1/scratch/share_dir/iPiC3D-nathan \
--choose-times all --processed \
--normalization alfven-infer --sample-nb-factor 1 --choose-species e,i,e,i \
--choose-x 0,512 --choose-y 0,256 \
--az-sigma 4 --recon-normalization notebook --csv-mode replace \
--output-csv diagnostics/reconnection_ecsim.csv
closure-diagnostics reconnection R0/iso_GEM_1e-2_Jze.5_r0_1024x1024 --backend menura \
--files-path /volume1/scratch/georgem/menura/runs/GEM/hortense/nathan5-12 \
--choose-times all --processed \
--choose-x 0,512 --choose-y 0,256 --menura-scale-ranges \
--az-sigma 4 --recon-normalization notebook --csv-mode replace \
--output-csv diagnostics/reconnection_menura.csv
# Recursive Menura discovery (--backend menura): if an experiment argument is a
# PARENT folder rather than a single run, every Menura run beneath it (any folder
# holding products/B_it*_rank_0_0.npy) is discovered and added to the CSV, one
# `run` per discovered run (labeled relative to --files-path, e.g. R5/new_FCNN_00172).
# A fully-specified run is used as-is. Example: pass `R5` to sweep all 60 runs of
# a campaign in one call (use --csv-mode replace so reruns don't append duplicates):
closure-diagnostics reconnection R5 --backend menura \
--files-path /dodrio/scratch/projects/2026_018/george/menura/runs/stability_campaign2 \
--choose-times all --az-sigma 4 --recon-normalization notebook --csv-mode replace \
--output-csv diagnostics/reconnection_menura.csv
# Pass several parents to combine campaigns: ... reconnection R0 R5 R7 R12 --backend menura ...
# Overlay the NORMALIZED rate on a log axis (plot the *_norm columns, not the raw
# recon_rate/time, which are mostly negative and vanish under --logy).
closure-diagnostics overlay \
diagnostics/reconnection_ecsim.csv diagnostics/reconnection_menura.csv \
--x time_norm --y recon_rate_norm --group-by run --logy \
--output diagnostics/reconnection_overlay.png
# === One-shot profile helpers ===============================================
# Export the 8 profile fields and emit one PNG per field (= one notebook cell
# each). Each script overwrites only its own dir; run both, then overlay above.
scripts/profiles_ecsim.sh diagnostics/profiles_ecsim # iPiC3D Le2DHGEM_RunID_0_f2
scripts/profiles_menura.sh diagnostics/profiles_menura # menura R0/iso_GEM_...
# Add experiments to overlay several runs per field. ECsim takes full names;
# menura takes bare run folders (expanded to RUN/$MODEL, override with MODEL=...):
# scripts/profiles_ecsim.sh diagnostics/cmp Le2DHGEM_RunID_0_f2 Le2DHGEM_RunID_5_f2
# scripts/profiles_menura.sh diagnostics/cmp R0 R5 R7Lightning logging is used by default (CSV logger in configs).
closure.log is written alongside the Lightning CSV logger outputs. If you set
--trainer.logger.init_args.name and --trainer.logger.init_args.version, the
log file goes into that exact run directory. If you omit version, Lightning's
auto-created version_* directory is used, so closure.log lives inside the
same per-run folder as metrics.csv.
Typical outputs include:
lightning_logs/or configured logger directorymetrics.csv- checkpoints from
ModelCheckpoint - matching TorchScript exports beside each checkpoint, e.g.
checkpoints/best-epoch=3-val_loss=0.1234.pt - normalized feature/target statistics in
norm_folder
Legacy files like loss_dict.pkl are no longer used.
This section covers everything needed to go from raw simulation data to production training runs.
Create a paths.yaml in the repository root (copy from paths.yaml.example):
work_dir: ./models # training outputs, checkpoints, normalization stats
data_dir: /scratch/data # root of your simulation dataRelative paths in paths.yaml are resolved against the directory that
contains the file. All config parameters that accept paths use a three-tier
resolution strategy (implemented by ClosureDataModule._resolve_path):
| Path form | Example | Resolution |
|---|---|---|
| Absolute | /scratch/data/Harris |
Used as-is |
Dot-relative (./, ../) |
./data/train.csv |
Resolved against the current working directory |
| Bare identifier | ecsim/Harris/Le |
Joined with the corresponding paths.yaml root (data_dir or work_dir) |
Simulation data is stored as HDF5 or pickle files under data_dir, organized
by experiment. Each file contains a single simulation time step:
data_dir/
ecsim/Harris/Le/
T2D14_filter2/
T2D-Fields_00500.h5.pkl
T2D-Fields_01000.h5.pkl
...
T2D15_filter2/
T2D-Fields_00500.h5.pkl
...
The files are read by closure.read_pic.read_features_targets, which
extracts the requested field channels (B, E, rho, J, P, etc.) and species.
Use scripts/datasplit.py to build CSV split files. Each CSV has a single
filenames column listing the data file paths:
# Training set from two simulation folders (time steps 5000–10000)
python scripts/datasplit.py \
folders=[T2D14_filter2,T2D15_filter2] \
name=train.csv \
root_folder=/scratch/data/ecsim/Harris/Le/ \
min_number=5000 max_number=10000
# Validation set from a held-out folder
python scripts/datasplit.py \
folders=[T2D16_filter2] \
name=val.csv \
root_folder=/scratch/data/ecsim/Harris/Le/
# Test set
python scripts/datasplit.py \
folders=[T2D17_filter2] \
name=test.csv \
root_folder=/scratch/data/ecsim/Harris/Le/Arguments:
| Argument | Required | Description |
|---|---|---|
folders |
yes | Folder names or paths to search, e.g. [a,b,c] |
name |
yes | Output CSV filename |
root_folder |
no | Root prepended to each folder path |
pattern |
no | Glob pattern (default: T2D-Fields_*) |
min_number |
no | Exclude files with time-step number below this |
max_number |
no | Exclude files with time-step number above this |
Three annotated templates are provided under configs/:
| Template | Architecture | Data shape | Use case |
|---|---|---|---|
configs/default.yaml |
FCNN | 2-D patches | CNN-based closure |
configs/mlp.yaml |
MLP | Flattened pixels | Pixel-wise baseline |
configs/resnet.yaml |
ResNet | 2-D patches | Deep residual closure |
Copy one and customize. Key sections explained:
data:
data_folder: ecsim/Harris/Le # bare → joined with data_dir
norm_folder: Harris/Le/my_experiment # bare → joined with work_dir
train_samples_file: ./splits/train.csv # ./ → CWD-relative
val_samples_file: ./splits/val.csv
test_samples_file: ./splits/test.csv
flatten: false # true for MLP, false for CNN/ResNet
patch_dim: [32, 32] # random crop size (CNN/ResNet only)
scaler_features: true # enable per-channel standardization
scaler_targets: true
prescaler_features: # per-channel transforms before standardization
- arcsinh # rho_e
- null # Bx (no prescaling)
- ...
prescaler_targets:
- log # Pxx_e (positive-definite diagonal)
- arcsinh # Pxy_e (signed off-diagonal)
- ...
read_features_targets_kwargs:
fields_to_read: # which HDF5 field groups to load
B: true
E: true
rho: true
J: true
P: true
PI: true
request_features: # specific channels extracted from fields
- rho_e
- Bx
- By
- Bz
- Jx_e
- Jy_e
- Jz_e
- Vx_e
- Vy_e
- Vz_e
request_targets:
- Pxx_e
- Pyy_e
- Pzz_e
- Pxy_e
- Pxz_e
- Pyz_e
choose_species: ['e', null] # electron species for multi-species data
choose_x: [0, 512] # spatial domain crop
choose_y: [175, 325]Prescaler guidance:
log— for strictly positive quantities (diagonal pressure)arcsinh— for quantities that can be negative or span orders of magnitudenull— no prescaling
Single GPU:
closure-train fit --config my_config.yamlMulti-GPU (DDP):
closure-train fit --config my_config.yaml \
--trainer.devices=4 \
--trainer.strategy=ddpSlurm cluster:
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --gres=gpu:4
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=12
srun closure-train fit --config my_config.yamlFor systematic architecture/feature-set sweeps, use
scripts/scaffold_harris_experiments.py. It generates a directory tree
of YAML configs and Slurm run.sh scripts:
python scripts/scaffold_harris_experiments.py \
--output-root models/Harris/Le/Le2GEM15ppc_lightning \
--data-folder ecsim/Harris/Le \
--split-root ecsim/sampling/ecsim/Harris/Le/Le2GEM15ppc \
--max-epochs 500 --devices 4This creates:
Le2GEM15ppc_lightning/
default/P/ 4lrs_es500.yaml 5lrs_es500.yaml ... run.sh
default/divP/ 4lrs.yaml 5lrs.yaml ... run.sh
noE/P/ ...
noJ/P/ ...
noJnoE/P/ ...
Each variant (default, noE, noJ, noJnoE) uses a different feature subset.
Each task (P, divP) uses different targets and prescalers. The run.sh
files are ready to submit with sbatch.
After training, load a checkpoint and evaluate:
from closure.module import ClosureLitModule
from closure.evaluation import evaluate_loss, evaluate_regression_metrics, transform_targets
module = ClosureLitModule.load_from_checkpoint("best.ckpt", network=network)
ground_truth, prediction = transform_targets(module, test_dataset, ...)
# Per-channel MSE
evaluate_loss(test_dataset, ground_truth, prediction, "MSELoss", verbose=True)
# Regression metrics table (R², RMSE, Pearson r, etc.)
metrics_df = evaluate_regression_metrics(test_dataset, ground_truth, prediction)Export deployable artifacts:
import torch
# Inference bundle (state dict + normalization stats + metadata)
torch.save({"state_dict": ..., "features_mean": ..., ...}, "inference_bundle.pt")
# TorchScript for deployment
scripted = torch.jit.script(network)
scripted.save("torchscript.pt")See examples/tutorials/tuto_train.py for a complete end-to-end example
including evaluation, visualization, and artifact export.
The iPiC3D production ablation studies (a matrix of feature set × target ×
architecture, for both CNN and MLP) are driven by helper scripts under
scripts/scaling_jobs/. Each study lives in its own folder under
models/Lightning/iPiC3D-nathan5-12/ with a README.md pinning its exact
splits and configs.
Launch — submit_prod_ablations.sh submits one atomic single-GPU SLURM job
per (model, feature, target, arch) cell so the matrix runs in parallel. All
options are environment variables (MODELS, FEATURES, TARGETS, ARCH_LIST,
SPLIT_TARGET, SPLIT_ARCH, MAX_EPOCHS, CONFIG_PATH, SAVE_DIR); always
preview with DRY=1 first:
DRY=1 MODELS="cnn mlp" ARCH_LIST="baseline shallower deeper" SPLIT_TARGET=1 SPLIT_ARCH=1 \
bash scripts/scaling_jobs/submit_prod_ablations.shEvaluate + plot — eval_test_ablations_interactive.sh (a thin wrapper around
eval_test_ablations.py) loads each cell's best checkpoint via RunLoader
(avoiding the closure-train test Lightning enums bug) and computes
per-channel regression metrics on both the test and validation splits
(EVAL_SPLITS="test val" by default), for both networks. Run it on a
gpu_rome_a100 node:
BASE=models/Lightning/iPiC3D-nathan5-12/<study> \
TEST_SPLIT=./splits/iPiC3D-nathan5-12/<test_split>.csv TEST_STRIDE=1 \
bash scripts/scaling_jobs/eval_test_ablations_interactive.shOutputs (all under <BASE>):
<cell>/{test,val}_metrics.csv— per-channel metrics, one file per cell per split{test,val}_ranking.csv— aggregate (mean over channels), one row per cellchannel_metrics.csv— combined per-channel table (all cells × both splits)figs/fig_channel_r2_<model>_<split>_<target>.png— per-channel R² heatmaps (target channel × feature × arch). Generated by the eval itself — no per-foldermake_figures.pyneeded; setSKIP_FIGURES=1to skip.
Memory: run.metrics() holds all predictions, so {TEST,VAL}_STRIDE subsample
the files to bound RAM (use =1 only with a large --mem). EVAL_SPLITS=test
restores the old test-only behavior; the val split is auto-read from each cell's
config.yaml (override with VAL_SPLIT / VAL_STRIDE).
examples/tutorials/tuto_train.py: self-contained training tutorial using bundled fixture dataexamples/tuto_train.ipynb: real-data tutorial (Lightning update section added at top)examples/tuto_train_synthetic.ipynb: synthetic-data tutorial (Lightning update section added at top)examples/optuna/optuna_sweep.py: Optuna sweep example with Lightningexamples/optuna/harris_optuna_sweep.py: Harris Le2GEM15ppc Optuna sweep for FCNN experiments
- The old
Trainer,PyNet, andclosure.trainersmodule were removed. - Use
ClosureLitModule+ClosureDataModulefor programmatic workflows. - Use
closure-trainfor config-driven workflows.
- Author: George Miloshevich
- License: MIT License
- Projects: STRIDE, HELIOSKILL
If you use closure in your research, please cite:
@article{miloshevich2026electron,
title = {Electron Neural Closure for Turbulent Magnetosheath Simulations: {{Energy}} Channels},
author = {Miloshevich, G. and Vranckx, L. and de Oliveira Lopes, F. N. and Dazzi, P. and Arrò, G. and Lapenta, G.},
year = {2026},
journal = {Physics of Plasmas},
volume = {33},
number = {1},
pages = {012901},
issn = {1070-664X},
doi = {10.1063/5.0300009},
}- examples/tuto_train.ipynb — Full tutorial notebook
- Source code docstrings for detailed API documentation
closure is designed for flexibility, reproducibility, and ease of use in scientific ML workflows. Contributions and feedback are welcome!
