Enable mgpu in FrameView#5514
Conversation
Greptile SummaryThis PR removes the
Confidence Score: 4/5Safe to merge after the The core Fabric device-allowlist removal is straightforward and the cuda:0 path is unaffected. The blocking concern is that
Important Files Changed
Sequence DiagramsequenceDiagram
participant Caller
participant FabricFrameView
participant USDRTSelectPrims
participant WarpKernel
participant UsdFrameView
Caller->>FabricFrameView: "__init__(device=cuda:N)"
Note over FabricFrameView: No device allowlist check (removed)
Caller->>FabricFrameView: set_world_poses(positions)
alt Fabric enabled
FabricFrameView->>USDRTSelectPrims: "SelectPrims(device=cuda:N)"
FabricFrameView->>WarpKernel: launch(compose_fabric_transformation)
FabricFrameView->>FabricFrameView: _prepare_for_reuse()
else Fabric disabled
FabricFrameView->>UsdFrameView: set_world_poses(...)
end
Caller->>FabricFrameView: get_scales()
alt Fabric enabled
FabricFrameView->>WarpKernel: launch(decompose_fabric_transformation)
FabricFrameView-->>Caller: wp.array (raw — no ProxyArray wrap)
else Fabric disabled
FabricFrameView->>UsdFrameView: get_scales()
FabricFrameView-->>Caller: result
end
Caller->>FabricFrameView: get_world_poses()
alt Fabric enabled
FabricFrameView->>WarpKernel: launch(decompose_fabric_transformation)
FabricFrameView-->>Caller: ProxyArray(positions), ProxyArray(orientations)
end
Reviews (6): Last reviewed commit: "Split FabricFrameView multi-GPU tests in..." | Re-trigger Greptile |
a6cd73e to
2c619fe
Compare
1c2e02d to
8de9a39
Compare
8de9a39 to
e206ba9
Compare
- Allow FabricFrameView to run on cuda:N for any N; USDRT SelectPrims no longer needs cuda:0. - Refactor the Fabric write path into a single _compose_fabric_transform helper shared by set_world_poses, set_scales, and the initial USD->Fabric sync, collapsing the sync to one kernel launch with one PrepareForReuse. - Replace the topology-invariant assert with RuntimeError so it survives python -O. - Add multi_gpu pytest marker plus cuda:1 unit-test coverage for both Fabric write paths, and run them in the existing test-multi-gpu CI job (one extra step, no new job).
The standard pytest invocation in CI runs the fabric test file without filtering on the ``multi_gpu`` marker, so the ``cuda:1`` tests get scheduled on every runner including the single-GPU ones. Previously ``_skip_if_unavailable`` hard-failed via ``pytest.fail`` whenever ``GITHUB_ACTIONS=true`` and the requested device was missing, on the theory that this would catch a misconfigured multi-GPU runner. In practice it just broke the standard CI: the dedicated ``test-fabric-multi-gpu`` workflow already pre-flights ``torch.cuda.device_count() >= 2`` before invoking pytest, so a genuinely misconfigured multi-GPU runner is already caught there. Always skip rather than fail when the requested ``cuda:N`` index isn't available. Drop the now-unused ``import os``.
Kit's CLI parser reads sys.argv directly at startup and segfaults on
pytest flags that collide with its own short options. Running
pytest -m multi_gpu source/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py
crashes during collection because Kit sees ``-m multi_gpu`` and exits
with ``Ill formed parameter: -m`` followed by SIGSEGV (exit code 245)
inside ``simulation_app._start_app``.
Strip sys.argv to argv[0] before instantiating AppLauncher. The test
file takes no CLI arguments of its own, mirroring the broader pattern
used by ``test_tiled_camera_env.py`` which assigns
``sys.argv[1:] = args_cli.unittest_args`` after argparse.
wp.to_torch on a ProxyArray is deprecated in favor of the .torch accessor. Switch the three call sites that consume the ProxyArray returned by get_world_poses; leave get_scales call sites alone since that method still returns a raw wp.array (no .torch accessor).
- Add a GPU-count pre-flight step to the test-fabric-multi-gpu CI job so a runner regression to a single GPU fails the workflow instead of silently skipping every cuda:1 test. This is what the comment in _skip_if_unavailable already promised existed. - Note that the sys.argv strip in test_views_xform_prim_fabric.py must stay between the AppLauncher import and its instantiation; any CLI parser or reordering re-exposes Kit to pytest argv and segfaults at startup. - Document the _fabric_usd_sync_done side effect on _compose_fabric_transform so callers can see why subsequent getters stop pulling from USD.
The class docstring and __init__ device-param doc still claimed ``cuda:0`` only. Refresh both to note that Fabric acceleration runs on any CUDA index, so the autodoc API page reflects the actual contract.
e206ba9 to
96f159e
Compare
ffb3e91 to
f4dd500
Compare
f4dd500 to
cf57d31
Compare
cf57d31 to
a7a6956
Compare
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot — Updated Review (4f262aa)
Commit: 4f262aa6710b19679b5ab94015f0dde9a4fed38b
Previous review: 556b74b (workflow separation in progress)
📋 What Changed Since Last Review
Commit 4f262aa finalizes the workflow separation with a clean split:
| Change | Description |
|---|---|
test-fabric-multi-gpu.yaml |
✅ New dedicated workflow (60 lines) — self-contained CI for Fabric tests |
test-multi-gpu.yaml |
✅ Restored to upstream/develop (removed Fabric test job) |
fabric_frame_view.py |
Minor: relocated TODO comments |
changelog.d/*.rst |
Simplified wording |
test_views_xform_prim_fabric.py |
Style cleanup only |
Key improvement: Complete workflow separation. FabricFrameView changes now trigger only test-fabric-multi-gpu.yaml (via path filter), while test-multi-gpu.yaml returns to its upstream state for distributed-training validation. The two workflows are completely decoupled.
✅ Full PR Summary
This PR removes the cuda:0-only restriction from FabricFrameView, enabling Fabric GPU acceleration on any CUDA device. This unblocks distributed training where each rank is pinned to a non-primary GPU (e.g., cuda:1).
🔍 Code Review
Architecture:
- ✅ Clean removal of
_fabric_supported_devicesallowlist and associated guards - ✅ Minimal, surgical change — core write paths unchanged
- ✅ Well-scoped TODO comments reference follow-up PRs (#5673, #5674)
- ✅ Docstrings updated to reflect multi-GPU support
Error Handling:
- ✅
RuntimeErrorreplacesassertfor topology-change invariant (survivespython -O) - ✅
_skip_if_unavailable()gracefully skips tests on single-GPU runners
Test Coverage:
- ✅ Three
cuda:1-parameterized tests: roundtrip poses, no-writeback, scales roundtrip - ✅ New
multi_gpupytest marker registered inpyproject.toml - ✅ Kit argv stripping prevents segfault from pytest flags
- ✅ Uses
.torchaccessor instead of deprecatedwp.to_torch()
CI Design:
- ✅
test-fabric-multi-gpu.yaml— dedicated 60-line workflow with GPU pre-flight - ✅ Path-filtered triggers: only runs on changes to FabricFrameView or its tests
- ✅ Runner pre-flight (
torch.cuda.device_count() >= 2) fails loudly if misconfigured - ✅
test-multi-gpu.yamlbyte-identical to upstream/develop — zero risk of regression
🚦 CI Status
| Check | Status |
|---|---|
| Build Wheel | ✅ Pass |
| changelog fragments | ✅ Pass |
| Broken Links | ✅ Pass |
| labeler | ✅ Pass |
| pre-commit | ⏳ Pending |
| FabricFrameView multi-GPU tests | ⏳ Pending (self-hosted runner) |
| license-check | ⏳ Pending |
| Installation Tests | ⏳ Pending |
| Docs | ⏳ Pending |
📝 Verdict
LGTM — ready for human review. The workflow separation is now complete and clean. Core functionality is well-tested with appropriate CI coverage. The change is low-risk: cuda:0 path is unchanged, and the new cuda:1 tests provide explicit coverage. Once the pending CI checks pass (particularly the multi-GPU tests on the self-hosted runner), this PR is ready to merge.
Reviewed at: 2026-05-20T16:25 UTC
a7a6956 to
556b74b
Compare
556b74b to
4f262aa
Compare
4f262aa to
cc1d789
Compare
cc1d789 to
c42ec86
Compare
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot — Updated Review (a039313)
Commit: a03931317a2be2486357370d03326e9c398f86ef
Previous review: c42ec86 → Now fully finalized
📋 What Changed Since Last Review
Two commits since c42ec86:
| Commit | Change |
|---|---|
60d99b86 |
Split FabricFrameView multi-GPU tests into dedicated workflow |
a0393131 |
Disable multi-GPU workflow (no self-hosted runner available) |
Key changes:
- Workflow separation complete —
test-fabric-multi-gpu.yamlis now a standalone workflow file, completely decoupled fromtest-multi-gpu.yaml - Workflow disabled — Multi-GPU workflow set to
workflow_dispatchonly with clear comment explaining why (no multi-GPU runner provisioned yet) - Pytest marker removed —
multi_gpumarker dropped frompyproject.toml; tests now useISAACLAB_TEST_MULTI_GPU=1env var gate via@pytest.mark.skipif - Changelog simplified — Removed extraneous "Changed" section, now focuses purely on the core fix
- Code cleanup — Removed redundant comments in
fabric_frame_view.py, usestorch.as_tensor()consistently in tests
✅ Full PR Summary
This PR removes the cuda:0-only restriction from FabricFrameView, enabling Fabric GPU acceleration on any CUDA device (cuda:0, cuda:1, etc.). This unblocks distributed training where each rank is pinned to a non-primary GPU.
🔍 Code Review
Architecture:
- ✅ Clean removal of
_fabric_supported_devicesallowlist - ✅ Minimal, surgical change — core Warp kernel paths unchanged
- ✅ TODO comments reference follow-up PRs (#5673, #5674)
- ✅ Docstrings updated for multi-GPU support
Test Coverage:
- ✅ Three
cuda:1-parameterized tests gated byISAACLAB_TEST_MULTI_GPU=1 - ✅
_skip_if_unavailable()gracefully skips on single-GPU runners - ✅ Uses
torch.as_tensor()for Warp→Torch conversion (consistent API)
CI Design:
- ✅
test-fabric-multi-gpu.yaml— dedicated workflow (disabled until runner available) - ✅ Path-filtered triggers ready for when runner is provisioned
- ✅
test-multi-gpu.yamlreturned to upstream state (zero diff risk) - ✅ Env var gating avoids pytest marker complexity
🚦 CI Status
| Check | Status |
|---|---|
| pre-commit | ✅ Pass |
| Check changelog fragments | ✅ Pass |
| Build Wheel | ✅ Pass |
| Check for Broken Links | ✅ Pass |
| Detect Changes | ✅ Pass |
| labeler | ✅ Pass |
| Load Config | ✅ Pass |
| Installation Tests | ⏳ Pending |
| Build Latest Docs | ⏳ Pending |
| license-check | ⏳ Pending |
📝 Verdict
LGTM — ready for human review. The workflow separation is clean and complete. The PR is low-risk:
cuda:0behavior unchanged- New
cuda:1tests provide explicit coverage (will run when multi-GPU runner is provisioned) - Multi-GPU workflow correctly disabled to avoid queue-indefinitely issue
Once CI passes, this is ready to merge.
Reviewed at: 2026-05-20T19:34 UTC
Move the test-fabric-multi-gpu job out of test-multi-gpu.yaml and into a dedicated test-fabric-multi-gpu.yaml. The two workflows share the same runner label, install step, and GPU pre-flight, but trigger on disjoint path sets so changes to FabricFrameView no longer gate the distributed-training validation and vice versa. test-multi-gpu.yaml is now byte-identical to upstream/develop.
c42ec86 to
60d99b8
Compare
No self-hosted runner with the 'multi-gpu' label is registered. All runs queue indefinitely. Kept as workflow_dispatch only so it can be manually triggered once a runner is provisioned. See also .github/workflows/test-multi-gpu.yaml (same issue).
Description
Removes the
cuda:0-only restriction inFabricFrameView. USDRTSelectPrimsnow accepts any CUDA device index, so Fabric acceleration runs on the simulation device (e.g.,cuda:1) instead of silently falling back to the slower USD path. This unblocks distributed training where each process is pinned to a specific GPU.Changes:
_fabric_supported_devices, the device guard in__init__, and the corresponding assertion in_initialize_fabric. Any CUDA device (or CPU) now works.cuda:1-parameterized tests gated byISAACLAB_TEST_MULTI_GPU=1env var, plus a dedicated CI workflow on the multi-GPU runner that sets it.wp.to_torch()calls. Replaced with.torchaccessor on ProxyArray (avoids DeprecationWarning).Type of change
cuda:0continues to work exactly as before;cuda:1+ now also works instead of silently falling back to USD. No public API surface changed.Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists thereTest plan
Three new tests gated by
ISAACLAB_TEST_MULTI_GPU=1and parameterized with["cuda:1"]:test_fabric_cuda1_world_pose_roundtrip—set_world_poses→get_world_posesreturns the same values on a non-primary CUDA device.test_fabric_cuda1_no_usd_writeback— Fabric writes oncuda:1do not write back to USD.test_fabric_cuda1_scales_roundtrip— covers theset_scaleswrite path oncuda:1.A dedicated CI workflow (
test-fabric-multi-gpu.yaml) runs on the[self-hosted, linux, x64, gpu, multi-gpu]runner withISAACLAB_TEST_MULTI_GPU=1set. Pre-flights withnvidia-smiandtorch.cuda.device_count(), fails loudly if the runner has < 2 GPUs.To verify locally on a multi-GPU machine:
ISAACLAB_TEST_MULTI_GPU=1 ./isaaclab.sh -p -m pytest \ source/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py -vTo verify the
cuda:0path is unchanged (multi-GPU tests auto-skip):./isaaclab.sh -p -m pytest \ source/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py -v