experiment with texture streaming off#5686
Conversation
Greptile SummaryThis experimental PR disables RTX texture streaming in the headless app configuration and removes the
Confidence Score: 3/5Not ready to merge — the flaky retry guard was removed before confirming that texture streaming = false fully eliminates the underlying render non-determinism. The original _FLAKY_MARK comment explicitly states RTX outputs are non-deterministic on every CI run (NVBUG#6152566). Removing the three-retry guard while that bug is still open means the first non-deterministic CI execution becomes an outright failure. The kit-file change is global and undocumented, risking silent impact on non-test headless workloads. Both changed files need attention: rendering_test_utils.py for the removal of the retry guard, and apps/isaaclab.python.headless.kit for the undocumented global scope of the texture streaming change. Important Files Changed
Reviews (1): Last reviewed commit: "experiment with texture streaming off" | Re-trigger Greptile |
| # Low-resolution camera outputs from RTX renderers are not deterministic enough to pass golden image testing | ||
| # on every CI run. (NVBUG#6152566) | ||
| _FLAKY_MARK = pytest.mark.flaky(max_runs=3, min_passes=1) | ||
| _FLAKY_MARK = [] # pytest.mark.flaky(max_runs=3, min_passes=1) |
There was a problem hiding this comment.
Removing flaky retry without evidence of determinism fix
The comment directly above this line explains why the mark existed: RTX renderer outputs are "not deterministic enough to pass golden image testing on every CI run" (NVBUG#6152566). Replacing the mark with an empty list means any non-deterministic render will cause an outright CI failure instead of a retry. If disabling texture streaming does not fully resolve the NVBUG, the test suite will regress in CI reliability. Before landing this, it would be worth confirming that the texture-streaming change actually eliminates the non-determinism that drove the original max_runs=3 guard.
| # Low-resolution camera outputs from RTX renderers are not deterministic enough to pass golden image testing | ||
| # on every CI run. (NVBUG#6152566) | ||
| _FLAKY_MARK = pytest.mark.flaky(max_runs=3, min_passes=1) | ||
| _FLAKY_MARK = [] # pytest.mark.flaky(max_runs=3, min_passes=1) |
There was a problem hiding this comment.
Commented-out code should be removed rather than left as a dormant inline comment, especially for a module-level constant that is read in every
pytest.param call. If the flaky mark is being intentionally retired, delete the old value; if this is experimental, track it in an issue/TODO instead.
| _FLAKY_MARK = [] # pytest.mark.flaky(max_runs=3, min_passes=1) | |
| _FLAKY_MARK = [] |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| renderer.multiGpu.enabled = true | ||
| renderer.multiGpu.autoEnable = true | ||
| 'rtx-transient'.resourcemanager.enableTextureStreaming = true | ||
| 'rtx-transient'.resourcemanager.enableTextureStreaming = false |
There was a problem hiding this comment.
Global setting change affects all headless users, not just tests
isaaclab.python.headless.kit is the shared app configuration loaded for every headless Isaac Lab run (training scripts, data-collection pipelines, etc.), not solely by CI rendering tests. Disabling texture streaming may change startup time, VRAM usage, and rendering fidelity for all downstream users. There is no comment here (or in the PR description) explaining whether this is intended to be permanent or is scoped to the test experiment.
There was a problem hiding this comment.
Code Review Summary
This PR experiments with disabling texture streaming to improve RTX rendering determinism (NVBUG#6152566). The approach is reasonable as a validation experiment.
Key Observations
Texture Streaming Change (apps/isaaclab.python.headless.kit):
- Disabling
enableTextureStreamingis a valid approach to investigate rendering determinism issues, as texture streaming can introduce non-deterministic memory allocation patterns. - Consider whether other
.kitfiles (e.g.,isaaclab.python.headless.rendering.kit) should have consistent settings if this experiment proves successful.
Flaky Mark Removal (source/isaaclab_tasks/test/rendering_test_utils.py):
- Setting
_FLAKY_MARK = []is technically correct — pytest accepts an empty iterable formarks=. - This change is appropriate for an experiment: if rendering is now deterministic, tests should pass consistently; if not, failures will indicate the fix is incomplete.
- The commented-out original code is helpful for reverting if needed.
Suggestions
-
Experiment validation: Monitor CI
rendering-correctnessresults closely. If tests pass consistently across multiple runs, the texture streaming change is likely effective. -
Consider scope: If successful, apply the texture streaming change to other headless kit variants for consistency.
-
Update comment if keeping: If this change becomes permanent, update the comment above
_FLAKY_MARKto explain why the flaky mark is no longer needed (rather than just commenting out the old value). -
PR description: Consider filling in the PR template with details about the experiment and expected outcomes.
Verdict
This is a well-structured experiment. The CI results will determine if the approach is successful. No blocking issues found.
Reviewed with focus on architecture, error handling, and test coverage.
Update (95b0b2d): New commits add updated golden images for the texture streaming experiment. This is the expected follow-up — 16 golden baseline images (dexsuite_kuka, shadow_hand, and registered_tasks) have been regenerated with the new rendering settings. ✅ No code changes, just binary asset updates. Previous review comments on the code still stand.
Update (441245d): New commits add several unrelated changes bundled with this PR:
-
h5py lazy import fix (
hdf5_dataset_file_handler.py): Movesimport h5pyfrom top-level to method-level to fix Windows startup crash (0xc0000139). ✅ Good fix — deferred imports prevent DLL load failures when h5py is not needed. -
Version bumps: isaaclab 5.5.0→5.5.1, isaaclab_ov 0.2.1→0.3.0, isaaclab_ovphysx 2.0.0→2.1.0, isaaclab_tasks 1.8.0→1.9.0
-
ROS inference env simplification (
ros_inference_env_cfg.py): Removed ~50 lines of Hubble Lab-specific robot mount configuration. Added to isaaclab_tasks CHANGELOG as "robot setup and mount configuration" change. -
Changelog consolidation: Fragment files merged into main CHANGELOGs.
-
Additional golden images: shadow_hand and registered_tasks images updated for texture streaming experiment.
enableTextureStreaming = false) and flaky mark removal remain unchanged. The P1 concern about removing the flaky retry guard before confirming determinism still stands.
Update (eb89a15): 16 golden images updated (cartpole and shadow_hand for both newton/physx backends). File size variations ±100 bytes, consistent with texture streaming change affecting rendered output. No code changes. ✅
Update (babfbd1): Changed approach on flaky test handling — _FLAKY_MARK is now set back to pytest.mark.flaky(max_runs=3, min_passes=1), but individual marks=_FLAKY_MARK arguments have been removed from all 14 RTX renderer test params. This means the flaky decorator is defined but not applied to any tests.
The net effect is the same as before (no flaky retry for RTX tests), but the code is cleaner — rather than setting _FLAKY_MARK = [] and keeping all the marks=_FLAKY_MARK lines, this removes the marks entirely. ✅ This is a cleaner solution that makes the intent clear: RTX renderer tests are expected to be deterministic with texture streaming disabled.
Update (12fbfb7): New commit adds logging configuration changes to OVRTXRendererCfg:
log_leveldefault changed: "verbose" → "info" (reduces log verbosity)log_file_pathdefault changed: "/tmp/ovrtx_renderer.log" → "/dev/stdout" (logs to stdout instead of file)- Changelog fragment added documenting these changes
✅ These are sensible defaults for production use — stdout logging is more container/CI friendly, and "info" level is appropriate for normal operation. Users can still override these if they need verbose debugging. No concerns.
Update (e137c79): Log level default reverted from "info" back to "verbose" in OVRTXRendererCfg. Changelog fragment updated to only mention the log_file_path change (no longer documents log_level).
This reversal makes sense — "verbose" logging is likely needed during this experimental phase to capture full diagnostic output when investigating texture streaming determinism. The /dev/stdout change for log_file_path remains in place. ✅
fc9d8b4 to
441245d
Compare
Description
Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context.
List any dependencies that are required for this change.
Fixes # (issue)
Type of change
Screenshots
Please attach before and after screenshots of the change if applicable.
Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists there