[ci_gen_kustomize_values] Co-locate provisionserver with metal3 to prevent DHCP failures#3738
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
f5c2a2c to
d660efa
Compare
|
Merge Failed. This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset. |
d660efa to
369ae18
Compare
|
Merge Failed. This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset. |
369ae18 to
3fa51c9
Compare
|
Merge Failed. This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset. |
3fa51c9 to
d0cf92f
Compare
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/3217375613864e0d83f7f88f394dcfaa ❌ openstack-k8s-operators-content-provider FAILURE in 7m 16s |
d0cf92f to
6b9c8b0
Compare
6b9c8b0 to
1339a1d
Compare
1339a1d to
cf58db9
Compare
e29b915 to
08a2b2b
Compare
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/0b04bcb1f4f54d518d017da862888f74 ✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 03m 30s |
08a2b2b to
a7b18cb
Compare
a7b18cb to
4ec3152
Compare
|
This PR is stale because it has been for over 15 days with no activity. |
michburk
left a comment
There was a problem hiding this comment.
Has this been tested? If so, could you link a jira ticket with any relevant downstream links hidden behind comments or descriptions that are marked as 'Red Hat Employee'?
Thanks!
| {% for key, value in _original_baremetal_template.items() %} | ||
| {{ key }}: {{ value }} | ||
| {% endfor %} |
There was a problem hiding this comment.
Would it be possible to use something like | to_nice_yaml for this instead of manually deconstructing and reconstructing the yaml key: values?
There was a problem hiding this comment.
yes, improved the code
| ansible.builtin.include_role: | ||
| name: run_hook | ||
|
|
||
| - name: Detect metal3 pod node for baremetal nodeset provisioning |
There was a problem hiding this comment.
Would it make more sense to put these tasks somewhere other than execute_step.yml? Would these tasks be better-suited to living in some dedicated hook, rather than being part of this generic execute_step.yml file?
There was a problem hiding this comment.
yes, you are right. moved the code to a different place
3c8b076 to
dc09396
Compare
dc09396 to
309d835
Compare
309d835 to
073a7c2
Compare
073a7c2 to
6ed9e2a
Compare
7720fe4 to
2ffc089
Compare
…event DHCP failures When metal3-dnsmasq pod restarts during a node's DHCP lease renewal on the provisioning network (172.23.0.0/24), NetworkManager fails to renew and sets ipv4.method=disabled. NMState operator then preserves this disabled state, causing permanent loss of provisioning network connectivity on that node. The issue occurs when OpenStackProvisionServer and metal3 pods run on different nodes. If metal3 restarts while a node is attempting DHCP renewal, the temporary unavailability of metal3-dnsmasq causes the renewal to fail. Solution: Automatically detect the node running metal3 pod (via k8s-app=metal3 label) and configure provisionServerNodeSelector in baremetalSetTemplate to schedule OpenStackProvisionServer on the same node. This ensures provisioning network connectivity is maintained because metal3-static-ip-manager maintains a static IP (172.23.0.3) on the metal3 node regardless of dnsmasq restarts. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Miguel Angel Nieto Jimenez <mnietoji@redhat.com>
2ffc089 to
2fc519d
Compare
…ures
When metal3-dnsmasq pod restarts during a node's DHCP lease renewal on the provisioning network (172.23.0.0/24), NetworkManager fails to renew and sets ipv4.method=disabled. NMState operator then preserves this disabled state, causing permanent loss of provisioning network connectivity on that node.
The issue occurs when OpenStackProvisionServer and metal3 pods run on different nodes. If metal3 restarts while a node is attempting DHCP renewal, the temporary unavailability of metal3-dnsmasq causes the renewal to fail.
Solution:
Automatically detect the node running metal3 pod (via k8s-app=metal3 label) and configure provisionServerNodeSelector in baremetalSetTemplate to schedule OpenStackProvisionServer on the same node. This ensures provisioning network connectivity is maintained because metal3-static-ip-manager maintains a static IP (172.23.0.3) on the metal3 node regardless of dnsmasq restarts.