Skip to content

ClusterNetworkCIDRConflict false positive on KubeVirt NodePools after CAPK dual-stack address change #8229

@amasolov

Description

@amasolov

Description

After cluster-api-provider-kubevirt began exposing all VMI interface IPs for dual-stack support (kubernetes-sigs/cluster-api-provider-kubevirt#366, synced via openshift/cluster-api-provider-kubevirt#347), all KubeVirt-based HyperShift NodePools report a false-positive ClusterNetworkCIDRConflict condition.

Root Cause

The CAPK Addresses() method now collects IPs from all vmiInstance.Status.Interfaces, including OVN-Kubernetes internal interfaces (e.g. ovn-k8s-mp0). These management port IPs are, by design, within the hosted cluster's own clusterNetwork CIDR.

The setCIDRConflictCondition function in the NodePool controller (introduced in PR #3880) iterates over every MachineInternalIP and MachineExternalIP and flags any address within the cluster network. It does not distinguish between:

  1. A machine whose infrastructure IP collides with the pod CIDR (a real problem)
  2. A machine that has a CNI-internal IP within the pod CIDR (expected, not a problem)

Impact

  • Affects all KubeVirt NodePools when using MCE 2.11.0+ (which includes the CAPK dual-stack change)
  • Does NOT occur on MCE 2.9.2 (which uses the older CAPK that only reported the primary IP)
  • The condition is informational only; no functional impact on cluster operations

How to Reproduce

  1. Deploy a hub cluster with OCP 4.19 and MCE 2.11.0
  2. Create a KubeVirt-based hosted cluster using OVN-Kubernetes with a secondary network via Multus/bridge
  3. Scale a NodePool to 1+ replicas
  4. Observe the ClusterNetworkCIDRConflict condition on the NodePool

Example

type: ClusterNetworkCIDRConflict
status: "True"
reason: InvalidConfiguration
message: "machine [example-node-pool-abcde-x1y2z] with ip [10.128.0.2]
  collides with cluster-network cidr [10.128.0.0/14], too many similar errors..."

Where 10.128.0.2 is the OVN management port IP (expected to be within the pod CIDR 10.128.0.0/14) and the machine's actual infrastructure address (e.g. 192.168.1.10 from DHCP) does not overlap.

Proposed Fix

Modify setCIDRConflictCondition to only report a collision when ALL of a machine's non-link-local addresses fall within the cluster network. When a machine has addresses both inside and outside the cluster network, the in-network addresses are treated as expected CNI-internal IPs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions