[core] Pin legacy OOM killing policy in test_memory_pressure fixtures by elliot-barn · Pull Request #62922 · ray-project/ray

elliot-barn · 2026-04-24T21:39:27Z

PR #62643 ([Core] (Resource Isolation 12/n) Switch group killing policy to by time killing policy) flipped the default OOM killing policy from the legacy owner-group, single-worker policy to the new time-based, multi-worker policy. Two tests in test_memory_pressure.py assert on the legacy semantics:

test_memory_pressure_kill_newest_worker asserts exactly one named actor survives after the OOM kill
test_newer_task_not_retriable_kill_older_retriable_task_first asserts the older retriable task is the one chosen to be killed

Under the new policy these assertions aren't guaranteed, and the mempress CI job on compile-req-compiled-py313 fails all 3 bazel attempts (the raylet log itself prints the "RAY_worker_killing_policy _by_group" escape hatch).

Set worker_killing_policy_by_group=True in the _system_config of both ray_with_memory_monitor fixtures to pin the legacy policy these tests were written against. Not specific to the py3.13 dep refresh — master inherits the same regression from PR #62643; fixing here to unblock the refresh branch's CI.

PR #62643 ([Core] (Resource Isolation 12/n) Switch group killing policy to by time killing policy) flipped the default OOM killing policy from the legacy owner-group, single-worker policy to the new time-based, multi-worker policy. Two tests in test_memory_pressure.py assert on the legacy semantics: - test_memory_pressure_kill_newest_worker asserts exactly one named actor survives after the OOM kill - test_newer_task_not_retriable_kill_older_retriable_task_first asserts the older retriable task is the one chosen to be killed Under the new policy these assertions aren't guaranteed, and the mempress CI job on compile-req-compiled-py313 fails all 3 bazel attempts (the raylet log itself prints the "RAY_worker_killing_policy _by_group" escape hatch). Set `worker_killing_policy_by_group=True` in the `_system_config` of both `ray_with_memory_monitor` fixtures to pin the legacy policy these tests were written against. Not specific to the py3.13 dep refresh — master inherits the same regression from PR #62643; fixing here to unblock the refresh branch's CI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

gemini-code-assist

Code Review

This pull request updates python/ray/tests/test_memory_pressure.py to explicitly set worker_killing_policy_by_group to True in test fixtures, ensuring tests maintain their original semantics after a default policy change. A review comment suggests adding a descriptive comment to one of the fixtures for consistency and maintainability.

gemini-code-assist · 2026-04-24T21:41:35Z

            "task_oom_retries": 0,
            "min_memory_free_bytes": -1,
            "task_oom_retry_delay_base_ms": 0,
+            "worker_killing_policy_by_group": True,


For consistency and maintainability, it would be beneficial to add a comment here explaining why worker_killing_policy_by_group is set to True, similar to the one in the ray_with_memory_monitor fixture. This will help future developers understand the context for this configuration.

# PR #62643 flipped the default OOM killing policy to the # time-based multi-worker policy; these tests assert on # single-worker, owner-group semantics. "worker_killing_policy_by_group": True

…olicy

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 5049c59. Configure here.}

cursor · 2026-04-28T14:52:56Z

            "task_oom_retries": 0,
            "min_memory_free_bytes": -1,
            "task_oom_retry_delay_base_ms": 0,
+            "worker_killing_policy_by_group": True,


Missed pinning legacy kill policy in inline ray.init test

Medium Severity

test_one_actor_max_lifo_kill_next_actor also asserts on legacy LIFO killing semantics (newest actor is killed, oldest survives) but its inline ray.init() _system_config was not updated with "worker_killing_policy_by_group": True. Since the default flipped to the time-based policy in PR #62643, this test is susceptible to the same regression the PR intends to fix — the assertions at lines 572–574 and 584–588 may fail under the new default policy.

^{Reviewed by Cursor Bugbot for commit 5049c59. Configure here.}

Kunchd · 2026-04-28T16:41:47Z

Hmm, the properties the test are testing should be respected by the new policy as well. If the new policy is failing these properties, we will need to fix the new policy instead. I will take a look.

Kunchd · 2026-04-28T21:47:06Z

@elliot-barn The failing integration test tests for a policy specific behavior that's already covered in unit tests. I've removed similar policy specific tests in favor of unit test instead, as the memory pressure related integration tests can be very environment sensitive and flaky. Here's my PR that does this: #63004.

aslonnie

(for @ray-project/ray-core team to review)

github-actions · 2026-05-14T01:15:13Z

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

elliot-barn requested a review from a team as a code owner April 24, 2026 21:39

gemini-code-assist Bot reviewed Apr 24, 2026

View reviewed changes

ray-gardener Bot added the core Issues that should be addressed in Ray Core label Apr 25, 2026

elliot-barn requested a review from aslonnie April 28, 2026 14:50

Merge branch 'master' into elliot-barn-fix-test-memory-pressure-oom-p…

5049c59

…olicy

cursor Bot reviewed Apr 28, 2026

View reviewed changes

elliot-barn added the go add ONLY when ready to merge, run all tests label Apr 28, 2026

aslonnie changed the title ~~[tests] Pin legacy OOM killing policy in test_memory_pressure fixtures~~ [core] Pin legacy OOM killing policy in test_memory_pressure fixtures Apr 29, 2026

aslonnie reviewed Apr 29, 2026

View reviewed changes

github-actions Bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label May 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core] Pin legacy OOM killing policy in test_memory_pressure fixtures#62922

[core] Pin legacy OOM killing policy in test_memory_pressure fixtures#62922
elliot-barn wants to merge 2 commits into
masterfrom
elliot-barn-fix-test-memory-pressure-oom-policy

elliot-barn commented Apr 24, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 24, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Apr 28, 2026

Uh oh!

Kunchd commented Apr 28, 2026

Uh oh!

Kunchd commented Apr 28, 2026

Uh oh!

aslonnie left a comment

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

elliot-barn commented Apr 24, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Apr 28, 2026

Choose a reason for hiding this comment

Missed pinning legacy kill policy in inline ray.init test

Uh oh!

Kunchd commented Apr 28, 2026

Uh oh!

Kunchd commented Apr 28, 2026

Uh oh!

aslonnie left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants