Skip to content

HDDS-14870. Allow balancing of over replicated and quasi closed containers#9964

Draft
sarvekshayr wants to merge 3 commits intoapache:masterfrom
sarvekshayr:HDDS-14870
Draft

HDDS-14870. Allow balancing of over replicated and quasi closed containers#9964
sarvekshayr wants to merge 3 commits intoapache:masterfrom
sarvekshayr:HDDS-14870

Conversation

@sarvekshayr
Copy link
Contributor

@sarvekshayr sarvekshayr commented Mar 21, 2026

What changes were proposed in this pull request?

Allow container balancer to include containers if

  • Container State is CLOSED and ContainerHealthState is OVER_REPLICATED but has minimum CLOSED replicas based on replication config along with additional QUASI_CLOSED replicas
  • Container State is QUASI_CLOSED and has all replicas in QUASI_CLOSED state.

This is allowed only if the new config hdds.container.balancer.include.non.standard.containers is set to true.

What is the link to the Apache JIRA

HDDS-14870

How was this patch tested?

Added tests in TestContainerBalancerSelectionCriteria and TestMoveManager.

@sarvekshayr sarvekshayr requested a review from sadanand48 March 21, 2026 14:55
Copy link
Contributor

@sreejasahithi sreejasahithi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sarvekshayr

I wanted to double check my understanding:
When includeNonStandardContainers is true, ContainerBalancerSelectionCriteria correctly allows:

  • Case A: CLOSED + OVER_REPLICATED containers (with min CLOSED replicas + QUASI_CLOSED replicas)
  • Case B: QUASI_CLOSED containers with all QUASI_CLOSED replicas

However, MoveManager.move() still enforces:

  • Health must be HEALTHY – so OVER_REPLICATED is rejected with REPLICATION_NOT_HEALTHY_BEFORE_MOVE
  • Container state must be CLOSED – so QUASI_CLOSED is rejected with REPLICATION_FAIL_CONTAINER_NOT_CLOSED

Since MoveManager doesn’t consider the config, it never sees includeNonStandardContainers. That would mean these containers get selected but fail when we actually try to move them.
Did you intend to update MoveManager as well to honor this config, or is there another path i am missing? I want to make sure i am not misreading the flow.

@sarvekshayr sarvekshayr marked this pull request as draft March 23, 2026 03:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants