Skip to content

Commit 3994462

Browse files
mfernestclaudeJakeSCahill
authored
docs(DOC-732): document partition_autobalancing_node_autodecommission_time (#1607)
Co-authored-by: Claude Sonnet 4.6 <[email protected]> Co-authored-by: JakeSCahill <[email protected]> Co-authored-by: Jake Cahill <[email protected]>
1 parent 56fa568 commit 3994462

3 files changed

Lines changed: 62 additions & 12 deletions

File tree

modules/manage/pages/cluster-maintenance/continuous-data-balancing.adoc

Lines changed: 42 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -2,49 +2,73 @@
22
:description: Continuous Data Balancing simplifies operations with self-healing clusters that dynamically balance partitions.
33
:page-aliases: cluster-administration:continuous-data-balancing.adoc
44
:page-categories: Management
5+
:page-topic-type: how-to
6+
:personas: infrastructure_operator
7+
:learning-objective-1: Enable Continuous Data Balancing on a Redpanda cluster
8+
:learning-objective-2: Check data balancing status using rpk
9+
:learning-objective-3: Cancel partition balancing moves for a specific node
510

611
[NOTE]
712
====
813
include::shared:partial$enterprise-license.adoc[]
914
====
1015

11-
Continuous Data Balancing continuously monitors your node and rack availability and disk usage. This enables self-healing clusters that dynamically balance partitions, ensuring smooth operations and optimal cluster performance.
16+
Continuous Data Balancing continuously monitors your node and rack availability and disk usage, dynamically balancing partitions to maintain smooth operations and optimal cluster performance.
1217

13-
It also maintains the configured replication level, even after infrastructure failure. Node availability has the highest priority in data balancing. After a rack (with all nodes belonging to it) becomes unavailable, Redpanda moves partition replicas to the remaining nodes. This violates the rack awareness constraint. But after this rack (or a new one) becomes available, Redpanda repairs the rack awareness constraint by moving excess replicas from racks that have more than one replica to the newly-available rack.
18+
Continuous Data Balancing also maintains the configured replication level, even after infrastructure failure. Node availability has the highest priority in data balancing. After a rack (with all nodes belonging to it) becomes unavailable, Redpanda moves partition replicas to the remaining nodes. This violates the rack awareness constraint. After the rack (or a replacement rack) becomes available, Redpanda repairs the constraint by moving excess replicas from racks that have more than one replica to the newly-available rack.
19+
20+
After reading this page, you will be able to:
21+
22+
* [ ] {learning-objective-1}
23+
* [ ] {learning-objective-2}
24+
* [ ] {learning-objective-3}
1425
1526
== Set Continuous Data Balancing properties
1627

17-
To enable Continuous Data Balancing, set the `partition_autobalancing_mode` property to `continuous`. You can then customize properties for monitoring your node availability and disk usage.
28+
To enable Continuous Data Balancing, set the `partition_autobalancing_mode` property to `continuous`. Customize the following properties to monitor node availability and disk usage.
1829

1930
|===
2031
| Property | Description
2132

2233
| `partition_autobalancing_node_availability_timeout_sec`
2334
| When a node is unreachable for the specified amount of time, Redpanda acts as if the node had been decommissioned: rebalancing begins, re-creating all of its replicas on other nodes in the cluster. +
2435
+
25-
*Note:* The node remains part of the cluster, and it can rejoin when it comes back online. A node that was actually decommissioned is removed from the cluster. +
36+
The node remains part of the cluster and can rejoin when it comes back online. A node that was actually decommissioned is removed from the cluster. +
2637
+
2738
Default is 900 seconds (15 minutes).
2839

40+
[[partition_autobalancing_node_autodecommission_timeout_sec]]
41+
| `partition_autobalancing_node_autodecommission_timeout_sec`
42+
| When a node is unavailable for this timeout duration, Redpanda automatically and permanently decommissions the node. This property only applies when `partition_autobalancing_mode` is set to `continuous`. Unlike `partition_autobalancing_node_availability_timeout_sec`, which moves partitions while keeping the node in the cluster, this property removes the node from the cluster entirely. A decommissioned node cannot rejoin the cluster. +
43+
+
44+
Only one node is decommissioned at a time. If a decommission is already in progress, automatic decommission does not trigger until it completes. If the decommission stalls (for example, because the node holds the only replica of a partition), manual intervention is required. See xref:manage:cluster-maintenance/nodewise-partition-recovery.adoc[]. +
45+
+
46+
By default, this property is null and automatic decommission is disabled.
47+
2948
| `partition_autobalancing_max_disk_usage_percent`
3049
| When a node fills up to this disk usage percentage, Redpanda starts moving replicas off the node to other nodes with disk utilization below the percentage. +
3150
+
3251
Default is 80%.
3352
|===
3453

35-
For information about other modes with `partition_autobalancing_mode`, see xref:./cluster-balancing.adoc[Cluster Balancing].
54+
For the other `partition_autobalancing_mode` options, see xref:manage:cluster-maintenance/cluster-balancing.adoc[Cluster balancing].
55+
56+
== Use data balancing commands
3657

37-
== Use Data Balancing commands
58+
Use the following `rpk` commands to monitor and control data balancing.
3859

3960
=== Check data balancing status
4061

4162
To see the status, run:
4263

43-
`rpk cluster partitions balancer-status`
64+
[,bash]
65+
----
66+
rpk cluster partitions balancer-status
67+
----
4468

4569
This shows the time since the last data balancing, the number of replica movements in progress, the nodes that are unavailable, and the nodes that are over the disk space threshold (default = 80%).
4670

47-
It also returns a data balancing status: `off`, `ready`, `starting`, `in-progress`, or `stalled`. If the command reports a `stalled` status, check the following:
71+
It also returns a data balancing status: `off`, `ready`, `starting`, `in-progress`, or `stalled`. If the command reports a `stalled` status, verify:
4872

4973
* Are there enough healthy nodes? For example, in a three node cluster, no movements are possible for partitions with three replicas.
5074
* Does the cluster have sufficient space? Partitions are not moved if all nodes in the cluster are utilizing more than their disk space threshold.
@@ -55,10 +79,16 @@ It also returns a data balancing status: `off`, `ready`, `starting`, `in-progres
5579

5680
To cancel the current partition balancing moves, run:
5781

58-
`rpk cluster partitions movement-cancel`
82+
[,bash]
83+
----
84+
rpk cluster partitions movement-cancel
85+
----
5986

60-
To cancel the partition moves in a specific node, add `--node`. For example:
87+
To cancel partition moves on a specific node, use the `--node` flag. For example:
6188

62-
`rpk cluster partitions movement-cancel --node 1`
89+
[,bash]
90+
----
91+
rpk cluster partitions movement-cancel --node 1
92+
----
6393

64-
NOTE: If continuous balancing hasn't been turned off, and if the system is still unbalanced, then it schedules another partition balancing. To stop all balancing, first set `partition_autobalancing_mode` to `off`. Then cancel current data balancing moves.
94+
NOTE: If continuous balancing is still enabled and the cluster remains unbalanced, Redpanda schedules another partition balancing round. To stop all balancing, first set `partition_autobalancing_mode` to `off`, then cancel the current data balancing moves.

modules/manage/pages/cluster-maintenance/decommission-brokers.adoc

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,16 @@ When you decommission a broker, its partition replicas are reallocated across th
1010
1111
CAUTION: When a broker is decommissioned, it cannot rejoin the cluster. If a broker with the same ID tries to rejoin the cluster, it is rejected.
1212

13+
== Decommissioning methods
14+
15+
There are two ways to decommission brokers in Redpanda:
16+
17+
* Manual decommissioning (described in this guide): Use `rpk` commands to explicitly decommission a broker when you need full control over the timing and selection of brokers to remove.
18+
19+
* Automatic decommissioning: When xref:manage:cluster-maintenance/continuous-data-balancing.adoc[Continuous Data Balancing] is enabled, you can configure the xref:manage:cluster-maintenance/continuous-data-balancing.adoc#partition_autobalancing_node_autodecommission_timeout_sec[partition_autobalancing_node_autodecommission_timeout_sec] property to automatically decommission brokers that remain unavailable for a specified duration.
20+
21+
Both methods permanently remove the broker from the cluster. Decommissioned brokers cannot rejoin.
22+
1323
== What happens when a broker is decommissioned?
1424

1525
When a broker is decommissioned, the controller leader creates a reallocation plan for all partition replicas that are allocated to that broker. By default, this reallocation is done in batches of 50 to avoid overwhelming the remaining brokers with Raft recovery. See xref:reference:tunable-properties.adoc#partition_autobalancing_concurrent_moves[`partition_autobalancing_concurrent_moves`].

modules/manage/pages/kubernetes/k-decommission-brokers.adoc

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,16 @@ You may want to decommission a broker in the following situations:
1515
1616
NOTE: When a broker is decommissioned, it cannot rejoin the cluster. If a broker with the same ID tries to rejoin the cluster, it is rejected.
1717

18+
== Decommissioning methods
19+
20+
There are two ways to decommission brokers in Redpanda:
21+
22+
* Manual decommissioning (described in this guide): Use `rpk` commands or Kubernetes automation to explicitly decommission a broker when you need full control over the timing and selection of brokers to remove.
23+
24+
* Automatic decommissioning: When xref:manage:cluster-maintenance/continuous-data-balancing.adoc[Continuous Data Balancing] is enabled, you can configure the xref:manage:cluster-maintenance/continuous-data-balancing.adoc#partition_autobalancing_node_autodecommission_timeout_sec[partition_autobalancing_node_autodecommission_timeout_sec] property to automatically decommission brokers that remain unavailable for a specified duration.
25+
26+
Both methods permanently remove the broker from the cluster. Decommissioned brokers cannot rejoin.
27+
1828
== Prerequisites
1929

2030
You must have the following:

0 commit comments

Comments
 (0)