allow the status-section to be updated on a crm_shadow commit with an explicit switch#4106
allow the status-section to be updated on a crm_shadow commit with an explicit switch#4106wenningerk wants to merge 2 commits into
Conversation
62b194c to
a09fa0c
Compare
nrwahl2
left a comment
There was a problem hiding this comment.
Here are a few comments. Two of them are requests to improve maintainability. The third comment questions this approach more broadly.
I'll try to give this issue some further thought tonight as well.
| "crm_shadow --diff", | ||
| expected_rc=ExitStatus.ERROR), | ||
| ], cib_gen=partial(copy_existing_cib, f"{cts_cli_data}/crm_mon.xml")), | ||
| TestGroup([ |
There was a problem hiding this comment.
Can we add these tests before the fix commit, with expected_rc set to an error? Then update the expected_rc and the output in the fix commit
There was a problem hiding this comment.
Test first - Ok. Just wasn't sure we were pushing for that. Will do.
There was a problem hiding this comment.
As this (at least technically) isn't a change that fixes a behavior but rather a feature added I chose to add a test prior to the commit that checks if the - in this case - undesired behavior (status not comited) is there. There is probably not much sense in checking that the new feature is not there by checking if trying to use it gives us a failure.
Together with the feature-commit I'm then checking if the new feature can be used to achieve the desired behavior. Was hesitant if I should have left the test checking the behavior without the new switch.
| "crm_attribute", | ||
| "crm_node", | ||
| "crm_resource", | ||
| "crm_shadow_status", |
There was a problem hiding this comment.
We need the commit message to reference RHEL-70283, and to explain why we're doing this (since future contributors may not be able to access this RHEL Jira issue).
We've encountered numerous issues when commit messages don't explain the reasoning for the changes in the commit. We often end up, much later, removing or changing behavior that we depend on. At best, we spend a long time trying to figure out why things are the way they are, and we're often not confident in the conclusions we reach.
There was a problem hiding this comment.
As it doesn't fix the issue right away I can add that being able to alter the status section is a pre-requisite for RHEL-70283 and that this commit gives us back this possibility.
There was a problem hiding this comment.
Of course I would have loved to get some pcs feedback regarding the approach.
Which is why I put it here for now as something generically useful. But the reference to what triggered the implementation is probably still a good idea.
There was a problem hiding this comment.
Added a bit of data that should help understand why the feature was added including a reference to the Jira Issue.
| return; | ||
| } | ||
|
|
||
| if (options.update_status) { |
There was a problem hiding this comment.
I don't understand how this would fix the issue. Based on RHEL-70283, it appears that the issue is that adding devices to a fence_scsi or fence_mpath resource using pcs stonith update causes resource restarts. However, pcs doesn't use crm_shadow at all. So updating the crm_shadow CLI tool doesn't seem like it would affect the behavior of pcs.
There was a problem hiding this comment.
It doesn't fix the issue right away. There is already RHELHA-1011 to take care of this from the pcs side. Just after your commit there was no low-level-tool left you could use to update the status section. Both for getting an atomic update and for the interface crm_shadow seems like the reasonable answer.
As since ec65417 there is no tool available anymore to modify the status section and crm_shadow would be the way to do that in an atomic way adding the switch --update-status. In favor to reverting to behavior before the commit above this approach doesn't introduce potentially unintentional behavioral changes. This was triggered by pcs feature that allows altering fence_scsi devices without resource restart altering the digests in already recorded actions in the status section. (Jira Issue RHEL-70283)
a09fa0c to
73e0e5c
Compare
|
Would it be possible to implement similar option for the cibadmin tool? So that Pcs uses diff-based push:
It would be more easier for pcs to just add another option to the cibadmin call |
Basically there were 2 reasons to go for crm_shadow:
From an architectural pov it would be definitely cleaner to have a little bit more code for this very - anyway hacky - |
Investigating why with Pacemaker 2.1.7 the high-level-tooling approach to prevent a resource restart as a result of a parameter change in the config by in parallel changing the digests in the status section wouldn't work anymore ...
Was barking at the wrong tree investigating the commit that changes how digests are calculated.
In parallel there was as well a commit in Pacemaker 2.1.7 ( ec65417) that removes the special handling of replacing the full cib and now triggers an election + join refresh whenever an “unsafe” client modifies the status section, overwriting the manually pushed digest before the scheduler can see it.
Added a ‘--commit-status’ option to crm_shadow that prevents this behavior (client will claim to be crm_shadow_status and this is added to the safe clients list).