[Diag][DO NOT MERGE] Positive arm — numpy 2.3.4 safe with 5656 fix#5684
[Diag][DO NOT MERGE] Positive arm — numpy 2.3.4 safe with 5656 fix#5684hujc7 wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Automated Code Review
Thank you for the diagnostic PR. The numpy 2.3.5 exclusion strategy looks comprehensive across the modified install paths, and the diagnostic block correctly forces the worst-case import order to validate numpy 2.3.4's safety.
Observations
Diagnostic Implementation
The pre-pytest diagnostic block in .github/actions/run-tests/action.yml correctly:
- Imports numpy before pytest spawns Kit (forcing atfork handler registration)
- Outputs version info and OpenBLAS library paths for log inspection
- Uses
|| trueto prevent diagnostic failures from blocking the actual test
Coverage of numpy 2.3.5 Exclusion
The exclusion is applied consistently across:
- All 6 modified
setup.pyfiles (isaaclab, isaaclab_tasks, isaaclab_rl, isaaclab_visualizers, isaaclab_teleop, isaaclab_mimic) - Both Dockerfiles (base and curobo) for ARM nlopt prep
- CLI install commands (ARM nlopt and Pink IK stack)
Minor Note (not blocking for a diagnostic PR)
docs/requirements.txt has a bare numpy dependency without the !=2.3.5 exclusion. This is likely fine since doc builds don't exercise the fork/atfork codepath, but worth noting for completeness when the actual fix PR (#5656) lands.
Diagnostic Verdict
This PR correctly validates whether numpy 2.3.4's bundled OpenBLAS (-8fb3d286 hash) is safe under the worst-case import order. CI results will provide the definitive answer:
- ✅ All green → #5656's
numpy>=2,!=2.3.5constraint is verified safe - ❌ SIGSEGV in
test_surface_gripper→ 2.3.x family shares the bug, need>=2.4path
The PR is appropriately marked as "Do not merge" for diagnostic purposes only.
Update (b2a15e3): This commit appears to include a large rebase/merge with unrelated feature work. The changes span:
- Newton actuator integration (isaaclab_physx): New
test_newton_actuators_physx.pytest suite,NewtonActuatorAdaptersupport inarticulation.py,write_actuator_stiffness_to_sim/write_actuator_damping_to_simmethods - Contact sensor for OVPhysX: Added
ovphysxpreset to locomotion velocity config - Preset resolution overhaul (isaaclab_tasks): Active-tree breadth-first preset resolution in
hydra.py, fixes for nested PresetCfg child scoping - IsaacTeleop MCAP replay: New
mcap_record_path/mcap_replay_pathparams, removed legacyteleop_devicesaccessor and XCR replay automation - numpy 2.3.5 exclusions: Extended to
isaaclab_rl/setup.pychangelog (the main diagnostic content is intact) - Various fixture updates: Golden images, changelog entries, and test marker adjustments
The original numpy diagnostic logic remains unchanged. The incremental diff does not impact the diagnostic's validity - CI will still exercise the worst-case import order. However, this scope expansion means the PR no longer purely validates numpy 2.3.4 in isolation; any CI failures could be caused by the unrelated feature changes.
|
Validation done — 35 pass / 2 fail. All 13 GPU test jobs that ran the diagnostic confirmed numpy 2.3.4 + openblas |
|
Reopening — this is the canonical worst-case CI evidence for the numpy 2.3.4 baseline in #5656. Keep open as a reference for reviewers; don't merge. |
numpy 2.3.5 ships a vendored OpenBLAS (libscipy_openblas64_-fdde5778.so) whose pthread_atfork handler crashes Kit's libomni.platforminfo fork() during SimulationApp startup. The release is excluded at every site that pulls numpy directly or transitively, so no pip resolve during isaaclab.sh --install or any Docker image build can land on it -- even transiently: source/isaaclab/setup.py source/isaaclab_tasks/setup.py source/isaaclab_rl/setup.py source/isaaclab_visualizers/setup.py source/isaaclab_teleop/setup.py (transitive via dex-retargeting) source/isaaclab_mimic/setup.py (transitive via h5py) isaaclab.cli.commands.install._ensure_pink_ik_dependencies_installed isaaclab.cli.commands.install._maybe_preinstall_arm_nlopt docker/Dockerfile.base (ARM nlopt prep) docker/Dockerfile.curobo (ARM nlopt prep + nvidia-curobo install) Each touchpoint adds only the ``!=2.3.5`` exclusion; no other version constraints are introduced. Validated: - env_isaaclab_test smoke test (numpy 2.4.5 + cmeel pinocchio + pink + daqp + qpsolvers all import; toy IK solve OK). - IsaacLab Pink IK unit tests: 54/54 pass against numpy 2.4.5. - PR isaac-sim#5655 worst-case run (diagnostic imports numpy before pytest spawns Kit, the order that originally crashed): 36 pass / 0 fail. The isaaclab_physx surface gripper SIGSEGV is gone. Related: numpy/numpy#30092, OpenMathLib/OpenBLAS#5520
Adds source/conftest.py so pytest dumps the resolved numpy version and the bundled OpenBLAS .so filename at session start. Used by the negative/positive arm validation PRs to capture which numpy bundle each CI test container ends up with after isaaclab.sh --install completes. The conftest.py imports numpy at module load. This is what pytest does naturally via isaaclab module imports anyway -- making the import explicit here only adds visibility, not crash conditions.
ea15c6c to
b2a15e3
Compare
Purpose
Positive arm of the single-variable A/B that validates #5656.
Companion negative arm: #5655.
Setup
a9b62101ca6).eae5a01c— the 10-sitenumpy!=2.3.5exclusion).source/conftest.pythat prints the resolved numpy version and bundled OpenBLAS .so filename at pytest session start. Same patch contents as the negative arm's conftest commit.isaaclab.sh --installoverrides, no force-reinstall, no action.yml edits.git diff jichuanh/validate-numpy-pin-5642..jichuanh/validate-numpy-2-3-4-worstcasebetween the negative arm and this branch equals exactly 5656's 10-site exclusion — nothing else.Expected result
!=2.3.5under cmeel-boost's cap).[dep-manifest]log line showsnumpy 2.3.4+libscipy_openblas64_-8fb3d286.so.isaaclab_physxisaaclab_newtonisaaclab_rlIf any canary fails with the atfork-in-libomni.platforminfo backtrace, 2.3.4's OpenBLAS bundle is also broken and 5656 needs to be tightened (e.g. exclude 2.3.x entirely and force >=2.4 via the path documented in 5656 §8.1).
Lifecycle
Diagnostic-only. Do not merge. Closes once evidence is captured.
Related