Skip to content

[Diag][DO NOT MERGE] Positive arm — numpy 2.3.4 safe with 5656 fix#5684

Draft
hujc7 wants to merge 2 commits into
isaac-sim:developfrom
hujc7:jichuanh/validate-numpy-2-3-4-worstcase
Draft

[Diag][DO NOT MERGE] Positive arm — numpy 2.3.4 safe with 5656 fix#5684
hujc7 wants to merge 2 commits into
isaac-sim:developfrom
hujc7:jichuanh/validate-numpy-2-3-4-worstcase

Conversation

@hujc7
Copy link
Copy Markdown
Collaborator

@hujc7 hujc7 commented May 19, 2026

Purpose

Positive arm of the single-variable A/B that validates #5656.

Companion negative arm: #5655.

Setup

  • Branch base: develop (a9b62101ca6).
  • Cherry-pick of 5656's fix commit (eae5a01c — the 10-site numpy!=2.3.5 exclusion).
  • One commit on top: source/conftest.py that prints the resolved numpy version and bundled OpenBLAS .so filename at pytest session start. Same patch contents as the negative arm's conftest commit.
  • No isaaclab.sh --install overrides, no force-reinstall, no action.yml edits.

git diff jichuanh/validate-numpy-pin-5642..jichuanh/validate-numpy-2-3-4-worstcase between the negative arm and this branch equals exactly 5656's 10-site exclusion — nothing else.

Expected result

  • Install resolves to numpy 2.3.4 (highest 2.3.x that satisfies !=2.3.5 under cmeel-boost's cap).
  • [dep-manifest] log line shows numpy 2.3.4 + libscipy_openblas64_-8fb3d286.so.
  • Canary jobs pass:
    • isaaclab_physx
    • isaaclab_newton
    • isaaclab_rl

If any canary fails with the atfork-in-libomni.platforminfo backtrace, 2.3.4's OpenBLAS bundle is also broken and 5656 needs to be tightened (e.g. exclude 2.3.x entirely and force >=2.4 via the path documented in 5656 §8.1).

Lifecycle

Diagnostic-only. Do not merge. Closes once evidence is captured.

Related

@github-actions github-actions Bot added isaac-mimic Related to Isaac Mimic team infrastructure labels May 19, 2026
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Thank you for the diagnostic PR. The numpy 2.3.5 exclusion strategy looks comprehensive across the modified install paths, and the diagnostic block correctly forces the worst-case import order to validate numpy 2.3.4's safety.

Observations

Diagnostic Implementation
The pre-pytest diagnostic block in .github/actions/run-tests/action.yml correctly:

  • Imports numpy before pytest spawns Kit (forcing atfork handler registration)
  • Outputs version info and OpenBLAS library paths for log inspection
  • Uses || true to prevent diagnostic failures from blocking the actual test

Coverage of numpy 2.3.5 Exclusion
The exclusion is applied consistently across:

  • All 6 modified setup.py files (isaaclab, isaaclab_tasks, isaaclab_rl, isaaclab_visualizers, isaaclab_teleop, isaaclab_mimic)
  • Both Dockerfiles (base and curobo) for ARM nlopt prep
  • CLI install commands (ARM nlopt and Pink IK stack)

Minor Note (not blocking for a diagnostic PR)
docs/requirements.txt has a bare numpy dependency without the !=2.3.5 exclusion. This is likely fine since doc builds don't exercise the fork/atfork codepath, but worth noting for completeness when the actual fix PR (#5656) lands.

Diagnostic Verdict

This PR correctly validates whether numpy 2.3.4's bundled OpenBLAS (-8fb3d286 hash) is safe under the worst-case import order. CI results will provide the definitive answer:

  • ✅ All green → #5656's numpy>=2,!=2.3.5 constraint is verified safe
  • ❌ SIGSEGV in test_surface_gripper → 2.3.x family shares the bug, need >=2.4 path

The PR is appropriately marked as "Do not merge" for diagnostic purposes only.


Update (b2a15e3): This commit appears to include a large rebase/merge with unrelated feature work. The changes span:

  • Newton actuator integration (isaaclab_physx): New test_newton_actuators_physx.py test suite, NewtonActuatorAdapter support in articulation.py, write_actuator_stiffness_to_sim/write_actuator_damping_to_sim methods
  • Contact sensor for OVPhysX: Added ovphysx preset to locomotion velocity config
  • Preset resolution overhaul (isaaclab_tasks): Active-tree breadth-first preset resolution in hydra.py, fixes for nested PresetCfg child scoping
  • IsaacTeleop MCAP replay: New mcap_record_path/mcap_replay_path params, removed legacy teleop_devices accessor and XCR replay automation
  • numpy 2.3.5 exclusions: Extended to isaaclab_rl/setup.py changelog (the main diagnostic content is intact)
  • Various fixture updates: Golden images, changelog entries, and test marker adjustments

The original numpy diagnostic logic remains unchanged. The incremental diff does not impact the diagnostic's validity - CI will still exercise the worst-case import order. However, this scope expansion means the PR no longer purely validates numpy 2.3.4 in isolation; any CI failures could be caused by the unrelated feature changes.

@hujc7
Copy link
Copy Markdown
Collaborator Author

hujc7 commented May 19, 2026

Validation done — 35 pass / 2 fail. All 13 GPU test jobs that ran the diagnostic confirmed numpy 2.3.4 + openblas -8fb3d286.so under the worst-case import order; isaaclab_physx (the original SIGSEGV canary) clean. Two failures unrelated to numpy: rendering-correctness-kitless (MDL shader warmup flake) and isaaclab_tasks [3/3] (test_newton_solver_preset_names legacy preset assertion, came from a merged develop commit). #5656's 2.3.4 baseline is verified end-to-end. Closing.

@hujc7 hujc7 closed this May 19, 2026
@hujc7
Copy link
Copy Markdown
Collaborator Author

hujc7 commented May 19, 2026

Reopening — this is the canonical worst-case CI evidence for the numpy 2.3.4 baseline in #5656. Keep open as a reference for reviewers; don't merge.

@hujc7 hujc7 reopened this May 19, 2026
hujc7 added 2 commits May 19, 2026 23:56
numpy 2.3.5 ships a vendored OpenBLAS
(libscipy_openblas64_-fdde5778.so) whose pthread_atfork handler crashes
Kit's libomni.platforminfo fork() during SimulationApp startup. The
release is excluded at every site that pulls numpy directly or
transitively, so no pip resolve during isaaclab.sh --install or any
Docker image build can land on it -- even transiently:

  source/isaaclab/setup.py
  source/isaaclab_tasks/setup.py
  source/isaaclab_rl/setup.py
  source/isaaclab_visualizers/setup.py
  source/isaaclab_teleop/setup.py        (transitive via dex-retargeting)
  source/isaaclab_mimic/setup.py         (transitive via h5py)
  isaaclab.cli.commands.install._ensure_pink_ik_dependencies_installed
  isaaclab.cli.commands.install._maybe_preinstall_arm_nlopt
  docker/Dockerfile.base                 (ARM nlopt prep)
  docker/Dockerfile.curobo               (ARM nlopt prep + nvidia-curobo install)

Each touchpoint adds only the ``!=2.3.5`` exclusion; no other version
constraints are introduced.

Validated:
- env_isaaclab_test smoke test (numpy 2.4.5 + cmeel pinocchio + pink + daqp
  + qpsolvers all import; toy IK solve OK).
- IsaacLab Pink IK unit tests: 54/54 pass against numpy 2.4.5.
- PR isaac-sim#5655 worst-case run (diagnostic imports numpy before pytest spawns
  Kit, the order that originally crashed): 36 pass / 0 fail. The
  isaaclab_physx surface gripper SIGSEGV is gone.

Related: numpy/numpy#30092, OpenMathLib/OpenBLAS#5520
Adds source/conftest.py so pytest dumps the resolved numpy version and the
bundled OpenBLAS .so filename at session start. Used by the negative/positive
arm validation PRs to capture which numpy bundle each CI test container ends
up with after isaaclab.sh --install completes.

The conftest.py imports numpy at module load. This is what pytest does
naturally via isaaclab module imports anyway -- making the import explicit
here only adds visibility, not crash conditions.
@hujc7 hujc7 force-pushed the jichuanh/validate-numpy-2-3-4-worstcase branch from ea15c6c to b2a15e3 Compare May 19, 2026 23:57
@hujc7 hujc7 changed the title [Diag] Validate numpy 2.3.4 under worst-case import order (off #5656) [Diag] Positive arm — numpy 2.3.4 safe with 5656 fix May 19, 2026
@hujc7 hujc7 changed the title [Diag] Positive arm — numpy 2.3.4 safe with 5656 fix [Diag][DO NOT MERGE] Positive arm — numpy 2.3.4 safe with 5656 fix May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

infrastructure isaac-mimic Related to Isaac Mimic team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant