Skip to content

Refactor TopoDiff recipe to use diffusion framework#1562

Merged
CharlelieLrt merged 8 commits intoNVIDIA:mainfrom
CharlelieLrt:topodiff-diffusion-refactor
May 5, 2026
Merged

Refactor TopoDiff recipe to use diffusion framework#1562
CharlelieLrt merged 8 commits intoNVIDIA:mainfrom
CharlelieLrt:topodiff-diffusion-refactor

Conversation

@CharlelieLrt
Copy link
Copy Markdown
Collaborator

PhysicsNeMo Pull Request

Description

Checklist

Dependencies

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 10, 2026

Greptile Summary

This PR refactors the TopoDiff example scripts to use the physicsnemo.diffusion framework, replacing the custom Diffusion class with DDPMLinearNoiseScheduler, MSEDSMLoss, DPSScorePredictor, and the framework sample() loop. New adapters (DDPMSolver, ClassifierGuidance) and a component test file are added in utils.py and test_components.py.

There are two P1 bugs affecting training correctness:

  • Double normalization in train_classifier.py and train_regressor.py: both scripts normalize pixel values to [-1, 1] before the training loop and then apply the same * 2 - 1 transform again inside the loop, producing a [-3, 1] input range throughout training.

Important Files Changed

Filename Overview
examples/generative/topodiff/inference.py Replaces hand-rolled DDPM loop with DPSScorePredictor + DDPMSolver + framework sample(); gradient flow preserved via torch.inference_mode(False); plotting improved to handle dynamic batch sizes.
examples/generative/topodiff/utils.py Adds DDPMLinearNoiseScheduler, DDPMSolver, and ClassifierGuidance adapters wrapping the new diffusion framework; DDPMSolver stochastic noise check uses .sum() which is fragile, and timesteps() has a div-by-zero edge case when num_steps=1.
examples/generative/topodiff/train.py Replaces old Diffusion class with MSEDSMLoss + DDPMLinearNoiseScheduler; license header has a stray "cd .." appended.
examples/generative/topodiff/train_classifier.py Replaces manual noise injection with noise_scheduler.add_noise(); contains a double-normalization bug: images are mapped [0,1]→[-1,1] at load time, then the same transform is applied again inside the training loop, producing a [-3,1] range.
examples/generative/topodiff/train_regressor.py Same double-normalization as train_classifier.py: topologies are scaled [0,1]→[-1,1] before the loop, then scaled again inside the loop, yielding an incorrect [-3,1] range for training.
examples/generative/topodiff/test_components.py New test file that validates scheduler, noise, epsilon round-trips, DDPMSolver, MSEDSMLoss, and sample() against the original Diffusion class; well-structured and covers the key refactored components.

Comments Outside Diff (3)

  1. examples/generative/topodiff/train_classifier.py, line 79-81 (link)

    P1 Same double normalization in validation branch

    The validation batch has the same issue — valid_img was already mapped to [-1, 1] on line 40, so the * 2 - 1 here produces [-3, 1]:

  2. examples/generative/topodiff/train_regressor.py, line 77-79 (link)

    P1 Double normalization — topologies scaled to [-3, 1] instead of [-1, 1]

    Line 40 already normalizes topologies to [-1, 1] with topologies = topologies * 2 - 1. Applying * 2 - 1 again in the training loop maps the range to [-3, 1], feeding the regressor incorrectly scaled inputs throughout training:

  3. examples/generative/topodiff/train.py, line 15 (link)

    P2 Stray shell command in license header

    cd .. was accidentally appended to the closing license comment line:

Reviews (1): Last reviewed commit: "Refactor TopoDiff recipe to use diffusio..." | Re-trigger Greptile

Comment thread examples/generative/topodiff/train_classifier.py
Comment thread examples/generative/topodiff/utils.py Outdated
Comment thread examples/generative/topodiff/utils.py
@CharlelieLrt CharlelieLrt requested a review from mnabian April 17, 2026 02:04
Copy link
Copy Markdown
Collaborator

@coreyjadams coreyjadams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved for pyproject.toml

Copy link
Copy Markdown
Collaborator

@mnabian mnabian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the changes look good overall. I could not spot any major issues. It would be good to track the known bugs in the code in an issue. Hopefully at some point in the future we can revisit this example and extend it to 3D cases.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 4, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@CharlelieLrt CharlelieLrt enabled auto-merge May 4, 2026 23:09
@CharlelieLrt
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

1 similar comment
@CharlelieLrt
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

@CharlelieLrt CharlelieLrt added this pull request to the merge queue May 5, 2026
Merged via the queue into NVIDIA:main with commit 93d1342 May 5, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants