diff --git a/docs/enterprise/autopilot/auto-repartition.md b/docs/enterprise/autopilot/auto-repartition.md new file mode 100644 index 0000000000..f6eb9c5a66 --- /dev/null +++ b/docs/enterprise/autopilot/auto-repartition.md @@ -0,0 +1,92 @@ +--- +keywords: [Auto Repartition, Autopilot, Repartition, Region, large Region, object storage, GC] +description: Introduces GreptimeDB Enterprise Auto Repartition and how to configure it to automatically split large Regions. +--- + +# Auto Repartition + +Auto Repartition is an Autopilot strategy that automatically splits large Regions into smaller Regions. When a table has a large Region that may become a performance bottleneck, Auto Repartition samples data, generates new partition boundaries, and submits a Repartition action. + +The split Regions can then be scheduled across multiple Datanodes to distribute potential bottlenecks. Auto Repartition reduces the operational cost of manually identifying large Regions and running Repartition. For manual Repartition, see [Repartition](/user-guide/deployments-administration/manage-data/repartition.md). + +## Prerequisites + +:::warning WARNING +Auto Repartition depends on GreptimeDB Repartition. It is only available in distributed clusters and requires: + +- [Shared object storage](/user-guide/deployments-administration/configuration.md#storage-options), such as AWS S3. +- [GC](/user-guide/deployments-administration/manage-data/gc.md) enabled on Metasrv and all Datanodes. + +Otherwise, Repartition cannot be executed. +::: + +Object storage stores Region files. GC reclaims old files after their references are released, which prevents files still in use from being deleted during Repartition. + +## When to use Auto Repartition + +Auto Repartition is useful in the following scenarios: + +- Some large Regions may become performance bottlenecks. +- The original partition rule no longer matches the current data distribution. +- You want to split large Regions into smaller Regions and distribute potential bottlenecks through later scheduling. +- You want to reduce the operational cost of manually identifying large Regions and running Repartition. + +## Limitations + +Auto Repartition only works for partitioned tables. It can only split tables that already have partition rules. If a table does not have partition rules, Auto Repartition does not generate new partition rules for it automatically. + +For more information about table partitioning and Repartition, see [Table Sharding](/user-guide/deployments-administration/manage-data/table-sharding.md) and [Repartition](/user-guide/deployments-administration/manage-data/repartition.md). + +## Configuration + +Auto Repartition depends on the Autopilot runtime and cluster statistics. The following example includes both shared configuration and Auto Repartition configuration: + +```toml +[[plugins]] +[plugins.autopilot] +tick_interval = "45s" + +[[plugins]] +[plugins.cluster_stat] +sampling_window = "45s" +max_history_windows = 5 +ewma_alpha = 0.2 + +[[plugins]] +[plugins.auto_repartition] +split_trigger_ratio = 1.8 +max_split_parts = 3 +table_repartition_cooldown_period = "60s" +max_actions_per_tick = 4 +max_actions_per_table_per_tick = 2 +``` + +In this example: + +- `plugins.autopilot` controls the Autopilot scheduling interval. +- `plugins.cluster_stat` controls sampling and smoothing for Datanode and Region write statistics. +- `plugins.auto_repartition` controls large Region split trigger conditions, split size, and submission limits. + +For details about shared configuration, see [Autopilot configuration](./overview.md#configuration). + +## Core options + +| Option | Default | Description | +| --- | --- | --- | +| `split_trigger_ratio` | `1.8` | The load ratio required before a Region is considered for splitting. For example, the default value `1.8` means split planning starts only when a Region reaches more than 1.8 times the target per-Region write load. | +| `max_split_parts` | `3` | The maximum number of child Regions a single Region can be split into. | +| `table_repartition_cooldown_period` | `"60s"` | The table-level Repartition cooldown period. After a Repartition request is submitted successfully, the same table will not submit another Repartition request during this period. | +| `max_actions_per_tick` | `4` | The maximum number of Repartition actions submitted in one scheduling cycle. | +| `max_actions_per_table_per_tick` | `2` | The maximum number of Repartition actions submitted for one table in one scheduling cycle. | + +## Advanced options + +The following options usually do not need to be changed. Adjust them only when you understand the table distribution and split-point selection behavior. + +| Option | Default | Description | +| --- | --- | --- | +| `sampling_budget` | `"10MB"` | The maximum amount of data sampled when computing split points for one Region. A larger budget may improve split-point quality but increases planning cost. | +| `split_segment_min_ratio` | `0.7` | The minimum allowed segment-size ratio when validating a split recommendation. | +| `split_segment_max_ratio` | `1.3` | The maximum allowed segment-size ratio when validating a split recommendation. | +| `min_samples` | `3` | The minimum number of historical samples required to evaluate Region write stability. | +| `max_region_history_cv` | `0.2` | The maximum coefficient of variation allowed for Region write history. Regions above this value are considered unstable. | diff --git a/docs/enterprise/autopilot/overview.md b/docs/enterprise/autopilot/overview.md new file mode 100644 index 0000000000..cce14c103b --- /dev/null +++ b/docs/enterprise/autopilot/overview.md @@ -0,0 +1,98 @@ +--- +keywords: [Autopilot, Region Balancer, Auto Repartition, Region, Datanode, load balancing, repartition] +description: Overview of GreptimeDB Enterprise Autopilot, including Region Balancer, Auto Repartition, and shared configuration. +--- + +# Overview + +Autopilot is a GreptimeDB Enterprise capability that automatically optimizes cluster load and data distribution. It runs in Metasrv, continuously collects write statistics from Datanodes and Regions, and submits scheduling actions when the configured conditions are met. This reduces the operational cost of identifying hotspots and manually adjusting the cluster. + +Autopilot currently includes the following capabilities: + +- **Region Balancer**: Automatically migrates hot Regions to balance write load across Datanodes. +- **Auto Repartition**: Automatically splits large Regions into smaller Regions to prevent a single large Region from becoming a performance bottleneck. The split Regions can then be scheduled across multiple Datanodes to distribute potential bottlenecks. + +## How it works + +Autopilot consists of a shared runtime, shared cluster statistics, and scheduling strategies: + +- **Runtime**: Triggers a scheduling cycle at a fixed interval. +- **Cluster statistics**: Collects Region write statistics from Datanode heartbeats and smooths short-term fluctuations. +- **Scheduling strategies**: Decide whether to move Regions or split large Regions based on the collected statistics. +- **Executors**: Submit actions generated by the strategies, such as Region Migration or Repartition. + +When both Region Balancer and Auto Repartition are enabled, they share the same Autopilot runtime and cluster statistics. + +## When to use Autopilot + +Autopilot is useful in the following scenarios: + +- Some Datanodes have a write load that remains significantly higher than others. +- Some large Regions may become performance bottlenecks. +- You want to reduce the operational cost of manually identifying load bottlenecks and running Region Migration or Repartition. + +## Limitations + +Different Autopilot strategies have their own limitations: + +- Region Balancer requires the number of schedulable Regions to be greater than the number of active Datanodes. Otherwise, moving Regions cannot make the load evenly distributed across Datanodes. +- Auto Repartition only works for partitioned tables. It can only split tables that already have partition rules. If a table does not have partition rules, Auto Repartition does not generate new partition rules for it automatically. For more information about table partitioning and Repartition, see [Table Sharding](/user-guide/deployments-administration/manage-data/table-sharding.md) and [Repartition](/user-guide/deployments-administration/manage-data/repartition.md). + +## Configuration + +Autopilot configuration includes shared configuration and strategy-specific configuration: + +- `plugins.autopilot`: Configures the Autopilot runtime. +- `plugins.cluster_stat`: Configures sampling and smoothing for Datanode and Region write statistics. +- `plugins.region_balancer`: Enables and configures Region Balancer. +- `plugins.auto_repartition`: Enables and configures Auto Repartition. + +The following example enables both Region Balancer and Auto Repartition: + +```toml +[[plugins]] +[plugins.autopilot] +tick_interval = "45s" + +[[plugins]] +[plugins.cluster_stat] +sampling_window = "45s" +max_history_windows = 5 +ewma_alpha = 0.2 + +[[plugins]] +[plugins.region_balancer] +acceptable_load_ratio = 0.12 +min_load_threshold = "4MB" +region_migration_cooldown_period = "1h" +window_stability_threshold = 2 + +[[plugins]] +[plugins.auto_repartition] +split_trigger_ratio = 1.8 +max_split_parts = 3 +table_repartition_cooldown_period = "60s" +max_actions_per_tick = 4 +max_actions_per_table_per_tick = 2 +``` + +If you only need one strategy, configure only `plugins.region_balancer` or `plugins.auto_repartition`. + +## Autopilot runtime configuration + +| Option | Default | Description | +| --- | --- | --- | +| `tick_interval` | `"45s"` | The interval of Autopilot scheduling cycles. A shorter interval reacts faster to load changes but may increase scheduling overhead. | + +## Cluster statistics configuration + +| Option | Default | Description | +| --- | --- | --- | +| `sampling_window` | `"45s"` | The duration of each statistics window. A larger window smooths short-term fluctuations but reacts more slowly. | +| `max_history_windows` | `5` | The number of historical statistics windows to keep. Region Balancer and Auto Repartition use historical windows to determine whether load is stable. | +| `ewma_alpha` | `0.2` | The EWMA smoothing factor. A larger value gives more weight to recent observations. A smaller value makes the statistics smoother. | + +## Next steps + +- To automatically balance write load across Datanodes, see [Region Balancer](./region-balancer.md). +- To automatically split large Regions, see [Auto Repartition](./auto-repartition.md). diff --git a/docs/enterprise/autopilot/region-balancer.md b/docs/enterprise/autopilot/region-balancer.md index 8303ad1888..ea4bd08710 100644 --- a/docs/enterprise/autopilot/region-balancer.md +++ b/docs/enterprise/autopilot/region-balancer.md @@ -1,39 +1,91 @@ --- -keywords: [region balancer, load balancing, configuration, datanodes, migration] -description: Configuration guide for the region balancer plugin in GreptimeDB Enterprise, which balances write loads across datanodes to prevent frequent region migrations. +keywords: [Region Balancer, Autopilot, Datanode, Region, load balancing, Region Migration] +description: Introduces GreptimeDB Enterprise Region Balancer and how to configure it to automatically balance Region write load across Datanodes. --- # Region Balancer -This plugin balances the write load of regions across datanodes, using specified window sizes and load thresholds to prevent frequent region migrations. You can enable the Auto Rebalancer feature by adding the following configuration to Metasrv. +Region Balancer is an Autopilot strategy that automatically balances Region write load across Datanodes. When some Datanodes remain under high load, Region Balancer selects suitable Regions and submits Region Migration actions to move them to lower-load Datanodes. + +Region Balancer runs in Metasrv and depends on the shared Autopilot runtime and cluster statistics. For an overview of Autopilot, see [Autopilot](./overview.md). + +## When to use Region Balancer + +Region Balancer is useful in the following scenarios: + +- Some Datanodes have a write load that remains higher than others. +- You want to reduce the operational cost of manually identifying hot nodes and running Region Migration. + +## Limitations + +Region Balancer requires the number of schedulable Regions to be greater than the number of active Datanodes. If the number of Regions is not greater than the number of active Datanodes, moving Regions cannot make the load evenly distributed across Datanodes. + +## Configuration + +Region Balancer depends on the Autopilot runtime and cluster statistics. The following example includes both shared configuration and Region Balancer configuration: ```toml [[plugins]] -[plugins.region_balancer] +[plugins.autopilot] +tick_interval = "45s" -window_size = "45s" +[[plugins]] +[plugins.cluster_stat] +sampling_window = "45s" +max_history_windows = 5 +ewma_alpha = 0.2 +[[plugins]] +[plugins.region_balancer] +acceptable_load_ratio = 0.12 +min_load_threshold = "4MB" +region_migration_cooldown_period = "1h" window_stability_threshold = 2 +``` -min_load_threshold = "10MB" +In this example: -tick_interval = "45s" -``` +- `plugins.autopilot` controls the Autopilot scheduling interval. +- `plugins.cluster_stat` controls sampling and smoothing for Datanode and Region write statistics. +- `plugins.region_balancer` controls Region Balancer trigger conditions, cooldown, and migration limits. + +For details about shared configuration, see [Autopilot configuration](./overview.md#configuration). + +## Core options + +| Option | Default | Description | +| --- | --- | --- | +| `acceptable_load_ratio` | `0.12` | The load ratio threshold above the average Datanode write load. For example, the default value `0.12` means a Datanode may be considered high-load when its write load is more than 12% above the average. | +| `min_load_threshold` | `"4MB"` | The minimum Datanode write load required to trigger balancing. This option represents a write rate in bytes/s. The configured value is written as a byte size. For example, `"4MB"` means 4MB/s. If the load is below this threshold, migration is not triggered even if the load is imbalanced. This avoids unnecessary scheduling under low traffic. | +| `region_migration_cooldown_period` | `"1h"` | The cooldown period after a Region migration. During the cooldown period, the same Region will not be migrated again. | +| `window_stability_threshold` | `2` | The number of historical statistics windows that must continuously satisfy the high-load condition before migration is triggered. A larger value reduces false positives caused by short-term fluctuations. | + +## Advanced options + +The following options usually do not need to be changed. Adjust them only when you understand the workload characteristics and scheduling behavior. + +| Option | Default | Description | +| --- | --- | --- | +| `region_min_load_threshold` | `"10KB"` | The minimum write load for a Region to be considered movable. This option represents a write rate in bytes/s. The configured value is written as a byte size. For example, `"10KB"` means 10KB/s. Regions below this threshold are not selected as migration candidates. | +| `scorer_var_bound` | `0.5` | The load bound used by the scorer to evaluate migration candidates. This value must be greater than `acceptable_load_ratio`. | +| `min_samples` | `3` | The minimum number of historical samples required to evaluate Region write stability. | +| `max_region_history_cv` | `0.2` | The maximum coefficient of variation allowed for Region write history. Regions above this value are considered unstable. | +| `datanode_max_unstable_or_unknown_count_ratio` | `0.5` | The maximum ratio of unstable or unknown Regions on a Datanode. Datanodes above this ratio are excluded from scheduling. | +| `datanode_max_unstable_or_unknown_load_ratio` | `0.5` | The maximum ratio of unstable or unknown Region load on a Datanode. Datanodes above this ratio are excluded from scheduling. | +| `max_actions_per_tick` | `2` | The maximum number of migration actions submitted in one scheduling cycle. | +| `max_actions_per_source_datanode` | `2` | The maximum number of Regions moved out from one source Datanode in one scheduling cycle. | +| `max_actions_per_target_datanode` | `1` | The maximum number of Regions moved into one target Datanode in one scheduling cycle. | + +## Legacy options + +Earlier versions supported configuring the following options directly under `plugins.region_balancer`: + +- `tick_interval` +- `window_size` +- `ewma_alpha` + +These options are still compatible, but they are not recommended for new configurations. Use the following shared options instead: -## Configuration Parameters - -- `window_size`: string - - **Description**: Defines the time span for the sliding window used to calculate the short-term average load of a region. This window helps smooth out temporary spikes in load, reducing the chance of unnecessary rebalancing. - - **Units**: Time (e.g., `"45s"` represents 45 seconds). - - **Recommendation**: Adjust according to load volatility. Larger values smooth more effectively but may delay load balancing responses. -- `window_stability_threshold`: integer - - **Description**: Specifies the number of consecutive windows that must meet the load-balancing criteria before a region migration is triggered. This threshold helps prevent frequent balancing actions, ensuring region migration only occurs when imbalance is sustained. - - **Recommendation**: Higher values delay rebalancing triggers and suit environments with volatile loads; a value of 2 means that at least two consecutive windows must meet the threshold before triggering. -- `min_load_threshold`: string - - **Description**: Minimum write load threshold (in bytes per second) to trigger region migration. Nodes with load below this value will not trigger rebalancing. - - **Units**: Bytes (e.g., `"10MB"` represents 10 MiB). - - **Recommendation**: Set an appropriate minimum to avoid triggering region migration with low load. Adjust based on typical traffic. -- `tick_interval`: string - - **Description**: Interval at which the balancer checks and potentially triggers a rebalancing task. - - **Units**: Time (e.g., `"45s"` represents 45 seconds). - - **Recommendation**: Set based on desired responsiveness and load volatility. Shorter intervals allow faster responses but may increase overhead. \ No newline at end of file +- `plugins.autopilot.tick_interval` for the Autopilot scheduling interval. +- `plugins.cluster_stat.sampling_window` for the statistics window. +- `plugins.cluster_stat.ewma_alpha` for the EWMA smoothing factor. diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/enterprise/autopilot/auto-repartition.md b/i18n/zh/docusaurus-plugin-content-docs/current/enterprise/autopilot/auto-repartition.md new file mode 100644 index 0000000000..b9d92e8876 --- /dev/null +++ b/i18n/zh/docusaurus-plugin-content-docs/current/enterprise/autopilot/auto-repartition.md @@ -0,0 +1,92 @@ +--- +keywords: [Auto Repartition, Autopilot, Repartition, Region, 重分区, 大 Region, 对象存储, GC] +description: 介绍 GreptimeDB Enterprise 的 Auto Repartition 功能,以及如何配置 Auto Repartition 自动拆分大 Region。 +--- + +# Auto Repartition + +Auto Repartition 是 Autopilot 的一个调度策略,用于自动将大 Region 拆分为多个小 Region。当某个表中存在可能成为性能瓶颈的大 Region 时,Auto Repartition 会基于采样结果生成新的分区边界,并提交 Repartition 操作。 + +拆分后的 Region 可以被调度到不同 Datanode 上,从而打散潜在的负载瓶颈。Auto Repartition 可以减少手动发现大 Region 和手动执行重分区的运维成本。关于手动重分区的说明,请参考[重分区](/user-guide/deployments-administration/manage-data/repartition.md)。 + +## 前置条件 + +:::warning 警告 +Auto Repartition 依赖 GreptimeDB 的重分区能力,仅支持分布式集群,并且需要: + +- 使用[共享对象存储](/user-guide/deployments-administration/configuration.md#storage-options),例如 AWS S3; +- 在 Metasrv 和所有 Datanode 上启用 [GC](/user-guide/deployments-administration/manage-data/gc.md)。 + +否则无法执行重分区。 +::: + +对象存储用于保存 Region 文件,GC 负责在引用释放后再回收旧文件,避免重分区过程中误删仍在使用的数据。 + +## 什么时候使用 Auto Repartition + +Auto Repartition 适合以下场景: + +- 某些大 Region 可能成为性能瓶颈; +- 表的原有分区规则已经不能匹配当前数据分布; +- 希望将大 Region 拆分为多个小 Region,并通过后续调度打散潜在的负载瓶颈; +- 希望减少手动分析大 Region 和手动执行 Repartition 的运维成本。 + +## 限制 + +Auto Repartition 仅对多分区表有效,只能拆分已经带有分区规则的表。如果表没有分区规则,Auto Repartition 不会为它自动生成新的分区规则。 + +关于表分区和重分区的说明,请参考[表分片](/user-guide/deployments-administration/manage-data/table-sharding.md)和[重分区](/user-guide/deployments-administration/manage-data/repartition.md)。 + +## 配置 + +Auto Repartition 依赖 Autopilot 运行时和集群统计信息。下面的示例同时包含共享配置和 Auto Repartition 配置: + +```toml +[[plugins]] +[plugins.autopilot] +tick_interval = "45s" + +[[plugins]] +[plugins.cluster_stat] +sampling_window = "45s" +max_history_windows = 5 +ewma_alpha = 0.2 + +[[plugins]] +[plugins.auto_repartition] +split_trigger_ratio = 1.8 +max_split_parts = 3 +table_repartition_cooldown_period = "60s" +max_actions_per_tick = 4 +max_actions_per_table_per_tick = 2 +``` + +其中: + +- `plugins.autopilot` 控制 Autopilot 的调度周期; +- `plugins.cluster_stat` 控制 Datanode 和 Region 写入统计信息的采样与平滑; +- `plugins.auto_repartition` 控制大 Region 拆分的触发条件、拆分规模和提交数量。 + +共享配置项的详细说明请参考 [Autopilot 配置](./overview.md#配置)。 + +## 核心配置项 + +| 配置项 | 默认值 | 说明 | +| --- | --- | --- | +| `split_trigger_ratio` | `1.8` | Region 写负载达到目标单 Region 写负载多少倍后,才会考虑拆分。例如默认值 `1.8` 表示当某个 Region 的写负载达到目标值的 1.8 倍以上时,才会进入拆分规划。 | +| `max_split_parts` | `3` | 单个 Region 最多拆分成多少个子 Region。 | +| `table_repartition_cooldown_period` | `"60s"` | 表级重分区冷却时间。一次重分区请求提交成功后,同一张表在冷却时间内不会再次提交重分区请求。 | +| `max_actions_per_tick` | `4` | 每个调度周期最多提交的重分区动作数。 | +| `max_actions_per_table_per_tick` | `2` | 每张表在每个调度周期内最多提交的重分区动作数。 | + +## 高级配置项 + +以下配置通常不需要调整,建议仅在明确了解表的数据分布和拆分点选择行为后修改。 + +| 配置项 | 默认值 | 说明 | +| --- | --- | --- | +| `sampling_budget` | `"10MB"` | 为单个 Region 计算拆分点时最多采样的数据量。较大的采样量可能提升拆分点质量,但也会增加规划成本。 | +| `split_segment_min_ratio` | `0.7` | 校验拆分建议时,允许的最小分段大小比例。 | +| `split_segment_max_ratio` | `1.3` | 校验拆分建议时,允许的最大分段大小比例。 | +| `min_samples` | `3` | 判断 Region 写入稳定性所需的最少历史样本数。 | +| `max_region_history_cv` | `0.2` | Region 写入历史的最大变异系数。超过该值的 Region 会被视为写入不稳定。 | diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/enterprise/autopilot/overview.md b/i18n/zh/docusaurus-plugin-content-docs/current/enterprise/autopilot/overview.md new file mode 100644 index 0000000000..69a7f2d567 --- /dev/null +++ b/i18n/zh/docusaurus-plugin-content-docs/current/enterprise/autopilot/overview.md @@ -0,0 +1,98 @@ +--- +keywords: [Autopilot, Region Balancer, Auto Repartition, Region, Datanode, 负载均衡, 重分区] +description: 介绍 GreptimeDB Enterprise 的 Autopilot 功能,包括 Region Balancer 和 Auto Repartition 的使用场景及共享配置。 +--- + +# 概述 + +Autopilot 是 GreptimeDB Enterprise 中用于自动优化集群负载和数据分布的能力。它运行在 Metasrv 中,通过持续收集 Datanode 和 Region 的写入统计信息,在满足条件时自动提交调度动作,减少人工排查热点和手动调整集群的运维成本。 + +Autopilot 当前包括以下能力: + +- **Region Balancer**:自动迁移热点 Region,使 Datanode 之间的写入负载更均衡。 +- **Auto Repartition**:自动将大 Region 拆分为多个小 Region,避免单个大 Region 成为性能瓶颈。拆分后的 Region 可以被调度到不同 Datanode 上,从而打散潜在的负载瓶颈。 + +## 工作方式 + +Autopilot 由共享的运行时、集群统计信息和不同的调度策略组成: + +- **运行时**:按照固定间隔触发一次调度周期。 +- **集群统计信息**:通过 Datanode heartbeat 收集 Region 写入统计信息,并对短期波动进行平滑。 +- **调度策略**:根据统计信息判断是否需要迁移 Region 或拆分大 Region。 +- **执行器**:将调度策略生成的动作提交给对应的执行流程,例如 Region Migration 或 Repartition。 + +当同时启用 Region Balancer 和 Auto Repartition 时,它们共享同一套 Autopilot 运行时和集群统计信息。 + +## 什么时候使用 Autopilot + +Autopilot 适合以下场景: + +- 集群中部分 Datanode 的写入负载长期明显高于其他 Datanode; +- 某些大 Region 可能成为性能瓶颈,需要拆分成多个小 Region; +- 希望减少手动分析负载瓶颈、手动执行 Region Migration 或 Repartition 的运维成本。 + +## 限制 + +不同 Autopilot 策略有各自的适用限制: + +- Region Balancer 要求可调度的 Region 数量大于活跃 Datanode 数量。否则即使迁移 Region,也无法让 Datanode 之间的负载变得均衡。 +- Auto Repartition 仅对多分区表有效,只能拆分已经带有分区规则的表。如果表没有分区规则,Auto Repartition 不会为它自动生成新的分区规则。关于表分区和重分区的说明,请参考[表分片](/user-guide/deployments-administration/manage-data/table-sharding.md)和[重分区](/user-guide/deployments-administration/manage-data/repartition.md)。 + +## 配置 + +Autopilot 的配置分为共享配置和策略配置: + +- `plugins.autopilot`:配置 Autopilot 运行时。 +- `plugins.cluster_stat`:配置 Datanode 和 Region 写入统计信息的采样和平滑方式。 +- `plugins.region_balancer`:启用并配置 Region Balancer。 +- `plugins.auto_repartition`:启用并配置 Auto Repartition。 + +下面的示例展示了同时启用 Region Balancer 和 Auto Repartition 的推荐配置: + +```toml +[[plugins]] +[plugins.autopilot] +tick_interval = "45s" + +[[plugins]] +[plugins.cluster_stat] +sampling_window = "45s" +max_history_windows = 5 +ewma_alpha = 0.2 + +[[plugins]] +[plugins.region_balancer] +acceptable_load_ratio = 0.12 +min_load_threshold = "4MB" +region_migration_cooldown_period = "1h" +window_stability_threshold = 2 + +[[plugins]] +[plugins.auto_repartition] +split_trigger_ratio = 1.8 +max_split_parts = 3 +table_repartition_cooldown_period = "60s" +max_actions_per_tick = 4 +max_actions_per_table_per_tick = 2 +``` + +如果只需要其中一个策略,可以只配置对应的 `plugins.region_balancer` 或 `plugins.auto_repartition`。 + +## Autopilot 运行时配置 + +| 配置项 | 默认值 | 说明 | +| --- | --- | --- | +| `tick_interval` | `"45s"` | Autopilot 的调度周期。较短的周期可以更快响应负载变化,但可能增加调度开销。 | + +## 集群统计配置 + +| 配置项 | 默认值 | 说明 | +| --- | --- | --- | +| `sampling_window` | `"45s"` | 每个统计窗口的时间跨度。较大的窗口会平滑短期波动,但响应会更慢。 | +| `max_history_windows` | `5` | 保留的历史统计窗口数量。Region Balancer 和 Auto Repartition 会基于历史窗口判断负载是否稳定。 | +| `ewma_alpha` | `0.2` | EWMA 平滑系数。值越大,统计结果越偏向最新观测值;值越小,统计结果越平滑。 | + +## 下一步 + +- 如需自动平衡 Datanode 之间的写入负载,请参考 [Region Balancer](./region-balancer.md)。 +- 如需自动拆分大 Region,请参考 [Auto Repartition](./auto-repartition.md)。 diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/enterprise/autopilot/region-balancer.md b/i18n/zh/docusaurus-plugin-content-docs/current/enterprise/autopilot/region-balancer.md index 195073e684..2a3c53f1b6 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/enterprise/autopilot/region-balancer.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/enterprise/autopilot/region-balancer.md @@ -1,39 +1,91 @@ --- -keywords: [Region Balancer, Datanode, 负载均衡, 窗口大小, 负载阈值, 迁移] -description: 介绍 Region Balancer 插件,通过配置窗口大小和负载阈值来均衡 Datanode 上的 Region 写入负载,避免频繁迁移。 +keywords: [Region Balancer, Autopilot, Datanode, Region, 负载均衡, Region Migration] +description: 介绍 GreptimeDB Enterprise 的 Region Balancer 功能,以及如何配置 Region Balancer 自动平衡 Datanode 之间的 Region 写入负载。 --- # Region Balancer -该插件用于均衡 Datanode 上的 Region 写入负载,通过指定的窗口大小和负载阈值来避免频繁的 Region 迁移。可通过添加以下配置至 Metasrv 开启 Region Rebalancer 功能。 +Region Balancer 是 Autopilot 的一个调度策略,用于自动平衡 Datanode 之间的 Region 写入负载。当某些 Datanode 持续处于高负载状态时,Region Balancer 会选择合适的 Region,并提交 Region Migration 操作,将 Region 迁移到负载较低的 Datanode。 + +Region Balancer 运行在 Metasrv 中,并依赖 Autopilot 共享的运行时和集群统计信息。关于 Autopilot 的整体说明,请参考 [Autopilot](./overview.md)。 + +## 什么时候使用 Region Balancer + +Region Balancer 适合以下场景: + +- 部分 Datanode 的写入负载长期高于其他 Datanode; +- 希望减少手动分析热点节点和手动执行 Region Migration 的运维成本。 + +## 限制 + +Region Balancer 要求可调度的 Region 数量大于活跃 Datanode 数量。如果 Region 数量不多于活跃 Datanode 数量,即使迁移 Region,也无法让 Datanode 之间的负载变得均衡。 + +## 配置 + +Region Balancer 依赖 Autopilot 运行时和集群统计信息。下面的示例同时包含共享配置和 Region Balancer 配置: ```toml [[plugins]] -[plugins.region_balancer] +[plugins.autopilot] +tick_interval = "45s" -window_size = "45s" +[[plugins]] +[plugins.cluster_stat] +sampling_window = "45s" +max_history_windows = 5 +ewma_alpha = 0.2 +[[plugins]] +[plugins.region_balancer] +acceptable_load_ratio = 0.12 +min_load_threshold = "4MB" +region_migration_cooldown_period = "1h" window_stability_threshold = 2 +``` -min_load_threshold = "10MB" +其中: -tick_interval = "45s" -``` +- `plugins.autopilot` 控制 Autopilot 的调度周期; +- `plugins.cluster_stat` 控制 Datanode 和 Region 写入统计信息的采样与平滑; +- `plugins.region_balancer` 控制 Region Balancer 的触发条件、冷却时间和迁移数量。 + +共享配置项的详细说明请参考 [Autopilot 配置](./overview.md#配置)。 + +## 核心配置项 + +| 配置项 | 默认值 | 说明 | +| --- | --- | --- | +| `acceptable_load_ratio` | `0.12` | Datanode 写负载超过平均负载的比例阈值。例如默认值 `0.12` 表示当某个 Datanode 的写负载超过平均负载 12% 以上时,可能被视为高负载 Datanode。 | +| `min_load_threshold` | `"4MB"` | 触发平衡的最小 Datanode 写负载。该配置表示写入速率,单位为 bytes/s;配置值使用 bytes 表示,例如 `"4MB"` 表示 4MB/s。低于该阈值时,即使负载不均衡,也不会触发迁移,避免低流量场景下频繁调度。 | +| `region_migration_cooldown_period` | `"1h"` | Region 迁移后的冷却时间。在冷却时间内,同一个 Region 不会再次被迁移,避免频繁迁移。 | +| `window_stability_threshold` | `2` | 连续多少个历史统计窗口都满足高负载条件后,才会触发迁移。较大的值可以减少短期波动造成的误触发。 | + +## 高级配置项 + +以下配置通常不需要调整,建议仅在明确了解负载特征和调度行为后修改。 + +| 配置项 | 默认值 | 说明 | +| --- | --- | --- | +| `region_min_load_threshold` | `"10KB"` | 可迁移 Region 的最小写负载。该配置表示写入速率,单位为 bytes/s;配置值使用 bytes 表示,例如 `"10KB"` 表示 10KB/s。低于该阈值的 Region 不会作为迁移候选。 | +| `scorer_var_bound` | `0.5` | 评分器的负载边界,用于计算迁移候选的收益。该值必须大于 `acceptable_load_ratio`。 | +| `min_samples` | `3` | 判断 Region 写入稳定性所需的最少历史样本数。 | +| `max_region_history_cv` | `0.2` | Region 写入历史的最大变异系数。超过该值的 Region 会被视为写入不稳定。 | +| `datanode_max_unstable_or_unknown_count_ratio` | `0.5` | Datanode 上写入不稳定或未知的 Region 数量的最大比例。超过该比例的 Datanode 不会参与调度。 | +| `datanode_max_unstable_or_unknown_load_ratio` | `0.5` | Datanode 上写入不稳定或未知的 Region 负载的最大比例。超过该比例的 Datanode 不会参与调度。 | +| `max_actions_per_tick` | `2` | 每个调度周期最多提交的迁移动作数。 | +| `max_actions_per_source_datanode` | `2` | 每个源 Datanode 在一个调度周期内最多迁出的 Region 数。 | +| `max_actions_per_target_datanode` | `1` | 每个目标 Datanode 在一个调度周期内最多迁入的 Region 数。 | + +## 兼容旧配置 + +旧版本中,Region Balancer 支持直接在 `plugins.region_balancer` 中配置以下参数: + +- `tick_interval` +- `window_size` +- `ewma_alpha` + +这些配置仍然兼容,但不建议在新配置中继续使用。新配置建议使用: -## 配置项说明 - -- `window_size`: string - - **说明**: 滑动窗口的时间跨度,用于计算区域负载的短期平均值。窗口期内的负载变化会被平滑,减轻短期突增对负载均衡的影响。 - - **单位**: 时间(支持格式:`"45s"` 表示 45 秒)。 - - **建议**: 根据集群负载波动情况配置,较大的窗口会使负载均衡响应更平稳。 -- `window_stability_threshold`: integer - - **说明**: 连续多少个窗口必须满足触发条件后,才会进行迁移操作。该阈值用于防止频繁的平衡操作,只在持续不均衡的情况下进行 Region 迁移。 - - **建议**: 较大的值会延迟再平衡的触发,适用于负载波动较大的系统;值为 2 表示需要至少两个连续窗口符合条件。 -- `min_load_threshold`: string - - **说明**: 触发 Region 迁移的最小写负载阈值(每秒字节数)。当节点的负载低于该值时,将不会触发迁移。 - - **单位**: 字节(例如,`"10MB"` 表示 10 MiB)。 - - **建议**: 设置为合理的最小值,防止小负载情况触发迁移。值可以根据系统实际流量进行调整。 -- `tick_interval`: string - - **说明**: 平衡器的运行间隔时间,控制负载均衡任务的触发频率。 - - **单位**: 时间(例如,"45s" 表示 45 秒)。 - - **建议**: 根据系统的响应速度和负载变化频率设置。较短的间隔可以更快响应负载变化,但可能增加系统开销。 \ No newline at end of file +- `plugins.autopilot.tick_interval` 配置 Autopilot 调度周期; +- `plugins.cluster_stat.sampling_window` 配置统计窗口; +- `plugins.cluster_stat.ewma_alpha` 配置 EWMA 平滑系数。 diff --git a/sidebars.ts b/sidebars.ts index 5cbc8ab328..a9f11c7129 100644 --- a/sidebars.ts +++ b/sidebars.ts @@ -482,7 +482,11 @@ const sidebars: SidebarsConfig = { { type: 'category', label: 'Autopilot', - items: ['enterprise/autopilot/region-balancer'], + items: [ + 'enterprise/autopilot/overview', + 'enterprise/autopilot/region-balancer', + 'enterprise/autopilot/auto-repartition', + ], }, { type: 'category',