Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 92 additions & 0 deletions docs/enterprise/autopilot/auto-repartition.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
---
keywords: [Auto Repartition, Autopilot, Repartition, Region, large Region, object storage, GC]
description: Introduces GreptimeDB Enterprise Auto Repartition and how to configure it to automatically split large Regions.
---

# Auto Repartition

Auto Repartition is an Autopilot strategy that automatically splits large Regions into smaller Regions. When a table has a large Region that may become a performance bottleneck, Auto Repartition samples data, generates new partition boundaries, and submits a Repartition action.

The split Regions can then be scheduled across multiple Datanodes to distribute potential bottlenecks. Auto Repartition reduces the operational cost of manually identifying large Regions and running Repartition. For manual Repartition, see [Repartition](/user-guide/deployments-administration/manage-data/repartition.md).

## Prerequisites

:::warning WARNING
Auto Repartition depends on GreptimeDB Repartition. It is only available in distributed clusters and requires:

- [Shared object storage](/user-guide/deployments-administration/configuration.md#storage-options), such as AWS S3.
- [GC](/user-guide/deployments-administration/manage-data/gc.md) enabled on Metasrv and all Datanodes.

Otherwise, Repartition cannot be executed.
:::

Object storage stores Region files. GC reclaims old files after their references are released, which prevents files still in use from being deleted during Repartition.

## When to use Auto Repartition

Auto Repartition is useful in the following scenarios:

- Some large Regions may become performance bottlenecks.
- The original partition rule no longer matches the current data distribution.
- You want to split large Regions into smaller Regions and distribute potential bottlenecks through later scheduling.
- You want to reduce the operational cost of manually identifying large Regions and running Repartition.

## Limitations

Auto Repartition only works for partitioned tables. It can only split tables that already have partition rules. If a table does not have partition rules, Auto Repartition does not generate new partition rules for it automatically.

For more information about table partitioning and Repartition, see [Table Sharding](/user-guide/deployments-administration/manage-data/table-sharding.md) and [Repartition](/user-guide/deployments-administration/manage-data/repartition.md).

## Configuration

Auto Repartition depends on the Autopilot runtime and cluster statistics. The following example includes both shared configuration and Auto Repartition configuration:

```toml
[[plugins]]
[plugins.autopilot]
tick_interval = "45s"

[[plugins]]
[plugins.cluster_stat]
sampling_window = "45s"
max_history_windows = 5
ewma_alpha = 0.2

[[plugins]]
[plugins.auto_repartition]
split_trigger_ratio = 1.8
max_split_parts = 3
table_repartition_cooldown_period = "60s"
max_actions_per_tick = 4
max_actions_per_table_per_tick = 2
```

In this example:

- `plugins.autopilot` controls the Autopilot scheduling interval.
- `plugins.cluster_stat` controls sampling and smoothing for Datanode and Region write statistics.
- `plugins.auto_repartition` controls large Region split trigger conditions, split size, and submission limits.

For details about shared configuration, see [Autopilot configuration](./overview.md#configuration).

## Core options

| Option | Default | Description |
| --- | --- | --- |
| `split_trigger_ratio` | `1.8` | The load ratio required before a Region is considered for splitting. For example, the default value `1.8` means split planning starts only when a Region reaches more than 1.8 times the target per-Region write load. |
| `max_split_parts` | `3` | The maximum number of child Regions a single Region can be split into. |
| `table_repartition_cooldown_period` | `"60s"` | The table-level Repartition cooldown period. After a Repartition request is submitted successfully, the same table will not submit another Repartition request during this period. |
| `max_actions_per_tick` | `4` | The maximum number of Repartition actions submitted in one scheduling cycle. |
| `max_actions_per_table_per_tick` | `2` | The maximum number of Repartition actions submitted for one table in one scheduling cycle. |

## Advanced options

The following options usually do not need to be changed. Adjust them only when you understand the table distribution and split-point selection behavior.

| Option | Default | Description |
| --- | --- | --- |
| `sampling_budget` | `"10MB"` | The maximum amount of data sampled when computing split points for one Region. A larger budget may improve split-point quality but increases planning cost. |
| `split_segment_min_ratio` | `0.7` | The minimum allowed segment-size ratio when validating a split recommendation. |
| `split_segment_max_ratio` | `1.3` | The maximum allowed segment-size ratio when validating a split recommendation. |
| `min_samples` | `3` | The minimum number of historical samples required to evaluate Region write stability. |
| `max_region_history_cv` | `0.2` | The maximum coefficient of variation allowed for Region write history. Regions above this value are considered unstable. |
98 changes: 98 additions & 0 deletions docs/enterprise/autopilot/overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
---
keywords: [Autopilot, Region Balancer, Auto Repartition, Region, Datanode, load balancing, repartition]
description: Overview of GreptimeDB Enterprise Autopilot, including Region Balancer, Auto Repartition, and shared configuration.
---

# Overview

Autopilot is a GreptimeDB Enterprise capability that automatically optimizes cluster load and data distribution. It runs in Metasrv, continuously collects write statistics from Datanodes and Regions, and submits scheduling actions when the configured conditions are met. This reduces the operational cost of identifying hotspots and manually adjusting the cluster.

Autopilot currently includes the following capabilities:

- **Region Balancer**: Automatically migrates hot Regions to balance write load across Datanodes.
- **Auto Repartition**: Automatically splits large Regions into smaller Regions to prevent a single large Region from becoming a performance bottleneck. The split Regions can then be scheduled across multiple Datanodes to distribute potential bottlenecks.

## How it works

Autopilot consists of a shared runtime, shared cluster statistics, and scheduling strategies:

- **Runtime**: Triggers a scheduling cycle at a fixed interval.
- **Cluster statistics**: Collects Region write statistics from Datanode heartbeats and smooths short-term fluctuations.
- **Scheduling strategies**: Decide whether to move Regions or split large Regions based on the collected statistics.
- **Executors**: Submit actions generated by the strategies, such as Region Migration or Repartition.

When both Region Balancer and Auto Repartition are enabled, they share the same Autopilot runtime and cluster statistics.

## When to use Autopilot

Autopilot is useful in the following scenarios:

- Some Datanodes have a write load that remains significantly higher than others.
- Some large Regions may become performance bottlenecks.
- You want to reduce the operational cost of manually identifying load bottlenecks and running Region Migration or Repartition.

## Limitations

Different Autopilot strategies have their own limitations:

- Region Balancer requires the number of schedulable Regions to be greater than the number of active Datanodes. Otherwise, moving Regions cannot make the load evenly distributed across Datanodes.
- Auto Repartition only works for partitioned tables. It can only split tables that already have partition rules. If a table does not have partition rules, Auto Repartition does not generate new partition rules for it automatically. For more information about table partitioning and Repartition, see [Table Sharding](/user-guide/deployments-administration/manage-data/table-sharding.md) and [Repartition](/user-guide/deployments-administration/manage-data/repartition.md).

## Configuration

Autopilot configuration includes shared configuration and strategy-specific configuration:

- `plugins.autopilot`: Configures the Autopilot runtime.
- `plugins.cluster_stat`: Configures sampling and smoothing for Datanode and Region write statistics.
- `plugins.region_balancer`: Enables and configures Region Balancer.
- `plugins.auto_repartition`: Enables and configures Auto Repartition.

The following example enables both Region Balancer and Auto Repartition:

```toml
[[plugins]]
[plugins.autopilot]
tick_interval = "45s"

[[plugins]]
[plugins.cluster_stat]
sampling_window = "45s"
max_history_windows = 5
ewma_alpha = 0.2

[[plugins]]
[plugins.region_balancer]
acceptable_load_ratio = 0.12
min_load_threshold = "4MB"
region_migration_cooldown_period = "1h"
window_stability_threshold = 2

[[plugins]]
[plugins.auto_repartition]
split_trigger_ratio = 1.8
max_split_parts = 3
table_repartition_cooldown_period = "60s"
max_actions_per_tick = 4
max_actions_per_table_per_tick = 2
```

If you only need one strategy, configure only `plugins.region_balancer` or `plugins.auto_repartition`.

## Autopilot runtime configuration

| Option | Default | Description |
| --- | --- | --- |
| `tick_interval` | `"45s"` | The interval of Autopilot scheduling cycles. A shorter interval reacts faster to load changes but may increase scheduling overhead. |

## Cluster statistics configuration

| Option | Default | Description |
| --- | --- | --- |
| `sampling_window` | `"45s"` | The duration of each statistics window. A larger window smooths short-term fluctuations but reacts more slowly. |
| `max_history_windows` | `5` | The number of historical statistics windows to keep. Region Balancer and Auto Repartition use historical windows to determine whether load is stable. |
| `ewma_alpha` | `0.2` | The EWMA smoothing factor. A larger value gives more weight to recent observations. A smaller value makes the statistics smoother. |

## Next steps

- To automatically balance write load across Datanodes, see [Region Balancer](./region-balancer.md).
- To automatically split large Regions, see [Auto Repartition](./auto-repartition.md).
102 changes: 77 additions & 25 deletions docs/enterprise/autopilot/region-balancer.md
Original file line number Diff line number Diff line change
@@ -1,39 +1,91 @@
---
keywords: [region balancer, load balancing, configuration, datanodes, migration]
description: Configuration guide for the region balancer plugin in GreptimeDB Enterprise, which balances write loads across datanodes to prevent frequent region migrations.
keywords: [Region Balancer, Autopilot, Datanode, Region, load balancing, Region Migration]
description: Introduces GreptimeDB Enterprise Region Balancer and how to configure it to automatically balance Region write load across Datanodes.
---

# Region Balancer

This plugin balances the write load of regions across datanodes, using specified window sizes and load thresholds to prevent frequent region migrations. You can enable the Auto Rebalancer feature by adding the following configuration to Metasrv.
Region Balancer is an Autopilot strategy that automatically balances Region write load across Datanodes. When some Datanodes remain under high load, Region Balancer selects suitable Regions and submits Region Migration actions to move them to lower-load Datanodes.

Region Balancer runs in Metasrv and depends on the shared Autopilot runtime and cluster statistics. For an overview of Autopilot, see [Autopilot](./overview.md).

## When to use Region Balancer

Region Balancer is useful in the following scenarios:

- Some Datanodes have a write load that remains higher than others.
- You want to reduce the operational cost of manually identifying hot nodes and running Region Migration.

## Limitations

Region Balancer requires the number of schedulable Regions to be greater than the number of active Datanodes. If the number of Regions is not greater than the number of active Datanodes, moving Regions cannot make the load evenly distributed across Datanodes.

## Configuration

Region Balancer depends on the Autopilot runtime and cluster statistics. The following example includes both shared configuration and Region Balancer configuration:

```toml
[[plugins]]
[plugins.region_balancer]
[plugins.autopilot]
tick_interval = "45s"

window_size = "45s"
[[plugins]]
[plugins.cluster_stat]
sampling_window = "45s"
max_history_windows = 5
ewma_alpha = 0.2

[[plugins]]
[plugins.region_balancer]
acceptable_load_ratio = 0.12
min_load_threshold = "4MB"
region_migration_cooldown_period = "1h"
window_stability_threshold = 2
```

min_load_threshold = "10MB"
In this example:

tick_interval = "45s"
```
- `plugins.autopilot` controls the Autopilot scheduling interval.
- `plugins.cluster_stat` controls sampling and smoothing for Datanode and Region write statistics.
- `plugins.region_balancer` controls Region Balancer trigger conditions, cooldown, and migration limits.

For details about shared configuration, see [Autopilot configuration](./overview.md#configuration).

## Core options

| Option | Default | Description |
| --- | --- | --- |
| `acceptable_load_ratio` | `0.12` | The load ratio threshold above the average Datanode write load. For example, the default value `0.12` means a Datanode may be considered high-load when its write load is more than 12% above the average. |
| `min_load_threshold` | `"4MB"` | The minimum Datanode write load required to trigger balancing. This option represents a write rate in bytes/s. The configured value is written as a byte size. For example, `"4MB"` means 4MB/s. If the load is below this threshold, migration is not triggered even if the load is imbalanced. This avoids unnecessary scheduling under low traffic. |
| `region_migration_cooldown_period` | `"1h"` | The cooldown period after a Region migration. During the cooldown period, the same Region will not be migrated again. |
| `window_stability_threshold` | `2` | The number of historical statistics windows that must continuously satisfy the high-load condition before migration is triggered. A larger value reduces false positives caused by short-term fluctuations. |

## Advanced options

The following options usually do not need to be changed. Adjust them only when you understand the workload characteristics and scheduling behavior.

| Option | Default | Description |
| --- | --- | --- |
| `region_min_load_threshold` | `"10KB"` | The minimum write load for a Region to be considered movable. This option represents a write rate in bytes/s. The configured value is written as a byte size. For example, `"10KB"` means 10KB/s. Regions below this threshold are not selected as migration candidates. |
| `scorer_var_bound` | `0.5` | The load bound used by the scorer to evaluate migration candidates. This value must be greater than `acceptable_load_ratio`. |
| `min_samples` | `3` | The minimum number of historical samples required to evaluate Region write stability. |
| `max_region_history_cv` | `0.2` | The maximum coefficient of variation allowed for Region write history. Regions above this value are considered unstable. |
| `datanode_max_unstable_or_unknown_count_ratio` | `0.5` | The maximum ratio of unstable or unknown Regions on a Datanode. Datanodes above this ratio are excluded from scheduling. |
| `datanode_max_unstable_or_unknown_load_ratio` | `0.5` | The maximum ratio of unstable or unknown Region load on a Datanode. Datanodes above this ratio are excluded from scheduling. |
| `max_actions_per_tick` | `2` | The maximum number of migration actions submitted in one scheduling cycle. |
| `max_actions_per_source_datanode` | `2` | The maximum number of Regions moved out from one source Datanode in one scheduling cycle. |
| `max_actions_per_target_datanode` | `1` | The maximum number of Regions moved into one target Datanode in one scheduling cycle. |

## Legacy options

Earlier versions supported configuring the following options directly under `plugins.region_balancer`:

- `tick_interval`
- `window_size`
- `ewma_alpha`

These options are still compatible, but they are not recommended for new configurations. Use the following shared options instead:

## Configuration Parameters

- `window_size`: string
- **Description**: Defines the time span for the sliding window used to calculate the short-term average load of a region. This window helps smooth out temporary spikes in load, reducing the chance of unnecessary rebalancing.
- **Units**: Time (e.g., `"45s"` represents 45 seconds).
- **Recommendation**: Adjust according to load volatility. Larger values smooth more effectively but may delay load balancing responses.
- `window_stability_threshold`: integer
- **Description**: Specifies the number of consecutive windows that must meet the load-balancing criteria before a region migration is triggered. This threshold helps prevent frequent balancing actions, ensuring region migration only occurs when imbalance is sustained.
- **Recommendation**: Higher values delay rebalancing triggers and suit environments with volatile loads; a value of 2 means that at least two consecutive windows must meet the threshold before triggering.
- `min_load_threshold`: string
- **Description**: Minimum write load threshold (in bytes per second) to trigger region migration. Nodes with load below this value will not trigger rebalancing.
- **Units**: Bytes (e.g., `"10MB"` represents 10 MiB).
- **Recommendation**: Set an appropriate minimum to avoid triggering region migration with low load. Adjust based on typical traffic.
- `tick_interval`: string
- **Description**: Interval at which the balancer checks and potentially triggers a rebalancing task.
- **Units**: Time (e.g., `"45s"` represents 45 seconds).
- **Recommendation**: Set based on desired responsiveness and load volatility. Shorter intervals allow faster responses but may increase overhead.
- `plugins.autopilot.tick_interval` for the Autopilot scheduling interval.
- `plugins.cluster_stat.sampling_window` for the statistics window.
- `plugins.cluster_stat.ewma_alpha` for the EWMA smoothing factor.
Loading
Loading