[CI] Enable multi-GPU pytest workflow#5738
Conversation
The multi-GPU runner pool went live on 2026-05-21 (g6e.12xlarge, self-hosted, linux, x64, gpu, multi-gpu). The on: block in this workflow was previously commented out because no runner advertised that label; that precondition is now satisfied. This change uncomments the original pull_request paths trigger and removes the DISABLED comment. No test or workflow logic changes.
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
PR #5738 — [CI] Enable FabricFrameView multi-GPU workflow trigger
Summary
This PR re-enables the pull_request trigger for the FabricFrameView multi-GPU test workflow. The trigger was previously commented out awaiting infrastructure provisioning of a multi-GPU runner.
Changes Reviewed
| File | Change Type | Lines |
|---|---|---|
.github/workflows/test-fabric-multi-gpu.yaml |
Modified | +5/-12 |
Analysis
✅ Clean change — The diff simply uncomments the original pull_request paths trigger and removes the DISABLED comment block. No workflow logic changes.
✅ Path filters retained — The workflow appropriately triggers only on changes to:
source/isaaclab_physx/isaaclab_physx/sim/views/fabric_frame_view.pysource/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py.github/workflows/test-fabric-multi-gpu.yaml
✅ workflow_dispatch preserved — Manual triggering capability remains for debugging/re-runs.
✅ Excellent PR description — Clear rationale, scope, and test plan provided.
CI Status
- Standard test jobs correctly self-skip (not in
Detect Changesallowlist) FabricFrameView multi-GPU unit testsjob is pending/queued as expected
Verdict
No issues found. This is a minimal, well-documented infrastructure enablement change.
🤖 Automated review by isaaclab-review-bot | Commit: baf811f
Update (commit c21377a): New commit adds a Set up Python step using setup-python@v5 with Python 3.12. This is a sensible addition to ensure consistent Python environment setup before running the Isaac Lab install script. ✅ No issues — LGTM.
Update (commit 4f136d1): Adds pip install cmake step before Isaac Lab installation. The detailed comment explains this bypasses the sudo apt-get path in install.py (line 35) by placing cmake on PATH via the pip wheel. Smart workaround for CI environments. ✅ No issues — LGTM.
Update (commit 43ab9b9): Changes install command to ./isaaclab.sh --install none (core submodules only). Well-commented explanation: the FabricFrameView cuda:1 tests only need isaaclab core + isaaclab_physx + Isaac Sim, and this avoids building robomimic's egl_probe wheel which requires libEGL/X11 headers missing on the multi-GPU runner image. ✅ Sensible minimal install approach — LGTM.
Update (commit a5ca71d): Adds explicit pip install 'isaacsim[all,extscache]==5.1.0' step. Logical follow-up to the --install none change — since Isaac Sim is an optional extra not pulled by minimal install, it must be installed explicitly. The comment correctly notes that without it, AppLauncher's import isaacsim silently fails, leaving EXP_PATH unset and causing KeyError at test collection. ✅ Well-documented fix — LGTM.
Update (commit 1106337): Bumps Isaac Sim version from 5.1.0 to 6.0.0.0. Updated comment explains the version pin: isaacsim 6.0.0.0 is the only release line with Python 3.12 wheels (the 5.x line on PyPI requires Python 3.11). This aligns correctly with the Python 3.12 setup added in c21377a. ✅ Necessary version bump — LGTM.
Update (commit 93b45a8): Adds EULA acceptance and headless environment variables (OMNI_KIT_ACCEPT_EULA, ACCEPT_EULA, ISAAC_SIM_HEADLESS) to the test step. The comment explains these bypass Kit's interactive EULA prompt on first launch, preventing isaacsim.bootstrap_kernel from reading stdin during test collection (which causes pytest to abort). ✅ Standard CI headless configuration — LGTM.
Update (commit 48b42bc): Increases job timeout from 30 to 60 minutes. Sensible buffer given the heavy Isaac Sim installation and multi-GPU test setup overhead. ✅ No issues — LGTM.
Matches the IL pyproject.toml requires-python (>=3.12,<3.13) and the pin every other active workflow uses (docs, wheel, pre-commit, changelog-check, license-check, nightly-changelog). Without this, the runner falls back to system Python which lacks tomllib (added in 3.11+ stdlib), and ./isaaclab.sh --install crashes in isaaclab.cli.commands.install on the tomllib import.
The multi-GPU runner has no passwordless sudo for the github-runner
user; ./isaaclab.sh --install crashes on 'sudo apt-get update' during
the system-deps step.
install.py:_install_system_deps short-circuits if shutil.which("cmake")
finds cmake on PATH. The pip wheel ships a cmake binary under the
setup-python@v5 venv's bin/, so adding 'pip install cmake' before
the install step makes the apt-get branch unreachable.
./isaaclab.sh --install (default 'all') pulls in isaaclab_mimic, which depends on robomimic, which depends on egl_probe. egl_probe builds from source via CMake and needs libEGL/X11 headers that the multi-GPU runner image lacks — its setup.py runs 'cmake --version' inside an isolated build env where the host system libs aren't visible. FabricFrameView cuda:1 tests only need isaaclab core + isaaclab_physx + Isaac Sim. '--install none' covers exactly that and avoids the egl_probe build entirely.
isaacsim is an optional extra in source/isaaclab/setup.py and is not pulled by ./isaaclab.sh --install (any scope). Production runner images ship Isaac Sim baked in; this multi-GPU runner image does not. Without isaacsim installed, AppLauncher's import isaacsim is silently suppressed by contextlib.suppress(ModuleNotFoundError), leaving EXP_PATH unset. Test collection then crashes with KeyError: 'EXP_PATH' at AppLauncher(headless=True). Pinned to isaacsim[all,extscache]==5.1.0 to match the IL extras_require pin in source/isaaclab/setup.py.
isaacsim==5.1.0 (the pin in source/isaaclab/setup.py) only ships a Python 3.11 wheel, so pip cannot resolve it under the 3.12 baseline declared in pyproject.toml. 6.0.0.0 is the first line with a 3.12 wheel and matches the version installed in the local env_isaaclab conda env. Note: source/isaaclab/setup.py's extras_require['isaacsim'] pin is stale and should be bumped as a follow-up; out of scope here.
isaacsim.bootstrap_kernel runs on import and prompts for EULA acceptance on first launch. Under pytest the stdin capture causes 'reading from stdin while output is captured' and the kit kernel exits non-zero before test collection completes. OMNI_KIT_ACCEPT_EULA=yes / ACCEPT_EULA=Y / ISAAC_SIM_HEADLESS=1 are the same env-var bypass build.yaml uses in its docker-test debug recipe.
Attempt 8 reached the test step (all install/preflight steps green) but the 30-min budget ran out during pytest execution. First-launch Kit-app cold start primes extscache, which is slow on a bare runner without a prebaked cache. Subsequent runs on the same runner will be faster, but the first one needs more headroom.
1. Summary
on: pull_requesttrigger in.github/workflows/test-fabric-multi-gpu.yaml. Single file changed, 5 insertions / 12 deletions.source/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py) — they are currently only reachable viaworkflow_dispatch.g6e.12xlarge, label[self-hosted, linux, x64, gpu, multi-gpu]) provisioned 2026-05-21.2. Why now
The trigger block was commented out at #5514 merge time because no self-hosted runner advertised the
multi-gpulabel; anypull_requestevent would have queued indefinitely. The infra precondition is now satisfied, so the original trigger can be restored verbatim.3. Scope
ISAACLAB_TEST_MULTI_GPU=1env gate, the 2-GPU pre-flight check, and the runner label all stay as-is.cuda:1parameterized tests).4. CI footprint
The diff only touches
.github/workflows/test-fabric-multi-gpu.yaml, which is not inbuild.yaml'sDetect Changesallowlist. The standard L40 docker-test matrix self-skips withrun_docker_tests=false, so iteration on this branch does not consume the production single-GPU pool. Only the new multi-GPU workflow runs.5. Test plan
torch.cuda.device_count() >= 2).cuda:1-parameterized tests intest_views_xform_prim_fabric.pypass.