-
Notifications
You must be signed in to change notification settings - Fork 7.6k
[Core] (Resource Isolation 12/n) Switch group killing policy to by time killing policy #62643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 15 commits
a70ff17
b35865a
5c2599f
8245dbb
caf195e
b886146
a7282df
6609b06
7e7031f
c5a67e2
dc8a434
2c14f01
2884c72
e394d33
251a225
83f9f20
4db1afd
2ce6a98
a177a40
4091330
c2655d3
6a99c1c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -10,7 +10,7 @@ | |
| # __last_task_start__ | ||
| import ray | ||
|
|
||
| @ray.remote(max_retries=-1) | ||
| @ray.remote(max_retries=0) | ||
| def leaks_memory(): | ||
| chunks = [] | ||
| bits_to_allocate = 8 * 100 * 1024 * 1024 # ~100 MiB | ||
|
|
@@ -67,11 +67,11 @@ def allocate(self, bytes_to_allocate: float) -> None: | |
| max_restarts=0, max_task_retries=0, name="second_actor" | ||
| ).remote() | ||
|
|
||
| # each task requests 0.3 of the system memory when the memory threshold is 0.4. | ||
| allocate_bytes = get_additional_bytes_to_reach_memory_usage_pct(0.3) | ||
| # We want total bytes to exceed 40% threshold to trigger the memory monitor. | ||
| allocate_bytes = get_additional_bytes_to_reach_memory_usage_pct(0.5) | ||
|
|
||
| first_actor_task = first_actor.allocate.remote(allocate_bytes) | ||
| second_actor_task = second_actor.allocate.remote(allocate_bytes) | ||
| first_actor_task = first_actor.allocate.remote(0.9 * allocate_bytes) | ||
| second_actor_task = second_actor.allocate.remote(0.1* allocate_bytes) | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Modifying the amount we allocate here so that killing the first actor alone will be guaranteed to relief enough memory pressure without killing a second actor. |
||
|
|
||
| error_thrown = False | ||
| try: | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -22,6 +22,7 @@ | |
| #include "absl/strings/str_format.h" | ||
| #include "absl/strings/str_join.h" | ||
| #include "ray/common/memory_monitor_interface.h" | ||
| #include "ray/common/ray_config.h" | ||
| #include "ray/util/logging.h" | ||
|
|
||
| namespace ray { | ||
|
|
@@ -292,23 +293,59 @@ int64_t MemoryMonitorUtils::NullableMin(int64_t left, int64_t right) { | |
| } | ||
| } | ||
|
|
||
| int64_t MemoryMonitorUtils::GetMemoryThreshold(int64_t total_memory_bytes, | ||
| float usage_threshold, | ||
| int64_t min_memory_free_bytes) { | ||
| int64_t MemoryMonitorUtils::GetMemoryThreshold( | ||
| int64_t total_memory_bytes, | ||
| float usage_threshold, | ||
| int64_t min_memory_free_bytes, | ||
| bool resource_isolation_enabled, | ||
| const CgroupManagerInterface &cgroup_manager) { | ||
| RAY_CHECK_GE(total_memory_bytes, MemoryMonitorInterface::kNull); | ||
| RAY_CHECK_GE(min_memory_free_bytes, MemoryMonitorInterface::kNull); | ||
| RAY_CHECK_GE(usage_threshold, 0); | ||
| RAY_CHECK_LE(usage_threshold, 1); | ||
| RAY_CHECK_GE(usage_threshold, 0) | ||
| << "Invalid configuration: usage_threshold must be >= 0"; | ||
| RAY_CHECK_LE(usage_threshold, 1) | ||
| << "Invalid configuration: usage_threshold must be <= 1"; | ||
|
|
||
| int64_t threshold_fraction = (int64_t)(total_memory_bytes * usage_threshold); | ||
| int64_t resolved_memory_threshold_bytes; | ||
| int64_t threshold_fraction = static_cast<int64_t>(total_memory_bytes * usage_threshold); | ||
|
|
||
| if (min_memory_free_bytes > MemoryMonitorInterface::kNull) { | ||
| int64_t threshold_absolute = total_memory_bytes - min_memory_free_bytes; | ||
| RAY_CHECK_GE(threshold_absolute, 0); | ||
| return std::max(threshold_fraction, threshold_absolute); | ||
| resolved_memory_threshold_bytes = std::max(threshold_fraction, threshold_absolute); | ||
| } else { | ||
| return threshold_fraction; | ||
| resolved_memory_threshold_bytes = threshold_fraction; | ||
| } | ||
|
|
||
| if (resource_isolation_enabled) { | ||
| StatusOr<std::string> user_memory_max_bytes_or = | ||
| cgroup_manager.GetUserCgroupConstraintValue("memory.max"); | ||
| RAY_CHECK(user_memory_max_bytes_or.ok()) << absl::StrFormat( | ||
| "Failed to get user cgroup memory limit when setting up memory monitor: %s", | ||
| user_memory_max_bytes_or.ToString()); | ||
| std::string user_memory_max_bytes_str = user_memory_max_bytes_or.value(); | ||
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When moving the logic from
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, the check for whether the string is empty is unnecessary as the underlying call to fetch the constraint value already asserts this. |
||
| if (!user_memory_max_bytes_str.empty() && | ||
| std::all_of(user_memory_max_bytes_str.begin(), | ||
| user_memory_max_bytes_str.end(), | ||
| ::isdigit)) { | ||
|
Kunchd marked this conversation as resolved.
|
||
| int64_t user_memory_max_bytes = std::stoll(user_memory_max_bytes_str); | ||
| int64_t reaction_buffer_bytes = | ||
| std::min(static_cast<int64_t>(total_memory_bytes * | ||
| kDefaultThresholdMonitorReactionBufferProportion), | ||
| RayConfig::instance().max_threshold_monitor_reaction_buffer_bytes()); | ||
| resolved_memory_threshold_bytes = user_memory_max_bytes - reaction_buffer_bytes; | ||
| RAY_CHECK_GE(resolved_memory_threshold_bytes, 0) << absl::StrFormat( | ||
| "Available user task memory is less than the kill memory buffer bytes: " | ||
| "%d < %d. This means the available memory for user proceses is likely " | ||
| "less than 5%% of total memory. Please consider decreasing the proportion " | ||
| "of reserved system memory if it was custom set.", | ||
| user_memory_max_bytes, | ||
| reaction_buffer_bytes); | ||
| } | ||
| } | ||
|
|
||
| return resolved_memory_threshold_bytes; | ||
| } | ||
|
|
||
| int64_t MemoryMonitorUtils::GetProcessUsedMemoryBytes( | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -72,9 +72,10 @@ RAY_CONFIG(uint64_t, raylet_check_gc_period_milliseconds, 100) | |
| /// it will start killing processes to free up the space. | ||
| /// Note: when resource isolation is enabled, the memory usage threshold is set to | ||
| /// total memory - system reserved memory (can be specified in ray start) - | ||
| /// kill_memory_buffer_bytes. Notice that the formula does not account for object store | ||
| /// memory in system reserved memory. To configure the usage threshold, please adjust the | ||
| /// system reserved memory in ray start command instead. Ranging from [0, 1] | ||
| /// max(5% of total memory, max_threshold_monitor_reaction_buffer_bytes). | ||
| /// Notice that the formula does not account for object store memory in system reserved | ||
| /// memory. To configure the usage threshold, please adjust the system reserved memory in | ||
| /// ray start command instead. Ranging from [0, 1] | ||
| RAY_CONFIG(float, memory_usage_threshold, 0.95) | ||
|
|
||
| /// The interval between runs of the memory usage monitor. | ||
|
|
@@ -90,9 +91,22 @@ RAY_CONFIG(uint64_t, memory_monitor_refresh_ms, 250) | |
| /// means 6.4 GB of the memory will not be usable. | ||
| RAY_CONFIG(int64_t, min_memory_free_bytes, (int64_t)-1) | ||
|
|
||
| /// The amount of memory to free under the memory usage threshold when | ||
| /// killing workers via the worker killing policy. | ||
| RAY_CONFIG(uint64_t, kill_memory_buffer_bytes, 3ULL * 1024 * 1024 * 1024) // 3GB | ||
| /// The maximum amount of memory to free under the memory usage threshold when | ||
| /// killing workers via the worker killing policy. Defaults to 5% of total memory. | ||
|
Kunchd marked this conversation as resolved.
Outdated
|
||
| RAY_CONFIG(int64_t, max_kill_memory_buffer_bytes, 3ULL * 1024 * 1024 * 1024) // 3GiB cap | ||
|
|
||
| /// The threshold monitor is poll based and may miss memory bursts occurring between | ||
| /// polls. This is the maximum buffer size that can be subtracted from memory.max to give | ||
| /// the threshold monitor time to react before memory max is reached under resource | ||
| /// isolation. Defaults to 5% of total memory. | ||
| RAY_CONFIG(int64_t, | ||
| max_threshold_monitor_reaction_buffer_bytes, | ||
| 2LL * 1024 * 1024 * 1024) // 2GiB | ||
|
|
||
| /// Whether to use the group-by-owner worker killing policy instead of the | ||
| /// default time-based worker killing policy. When true, workers killed by the | ||
| /// by group killing policy (legacy policy). | ||
|
Kunchd marked this conversation as resolved.
Outdated
|
||
| RAY_CONFIG(bool, WORKER_KILLING_POLICY_BY_GROUP, false) | ||
|
Kunchd marked this conversation as resolved.
Outdated
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like we are changing the default behavior of the killing policy for all the users. Wondering have we communicated the default behavior to the users and should we gradually roll out the change instead of the directly change the config?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Chatted offline. The new policy behaves very similarly to the existing policy (identical in the common case when the driver is the single source that's submitting the tasks). Additionally, it addresses the issue where the existing policy was not killing aggressively enough to relieve us of memory pressure. We expect the new policy to be a general improvement across the board, but we will modify this PR to make it explicit that there's a behavioral change, and include this in the next version release. We will also leave a comment in the oom log so that users will know how to revert if they experience oom. |
||
|
|
||
| /// The reserved memory bytes for system processes | ||
| /// enforced via cgroup memory.min constraint which guarantees | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.