allow the status-section to be updated on a crm_shadow commit with an explicit switch by wenningerk · Pull Request #4106 · ClusterLabs/pacemaker

wenningerk · 2026-05-07T12:48:08Z

Investigating why with Pacemaker 2.1.7 the high-level-tooling approach to prevent a resource restart as a result of a parameter change in the config by in parallel changing the digests in the status section wouldn't work anymore ...
Was barking at the wrong tree investigating the commit that changes how digests are calculated.
In parallel there was as well a commit in Pacemaker 2.1.7 ( ec65417) that removes the special handling of replacing the full cib and now triggers an election + join refresh whenever an “unsafe” client modifies the status section, overwriting the manually pushed digest before the scheduler can see it.
Added a ‘--commit-status’ option to crm_shadow that prevents this behavior (client will claim to be crm_shadow_status and this is added to the safe clients list).

nrwahl2

Here are a few comments. Two of them are requests to improve maintainability. The third comment questions this approach more broadly.

I'll try to give this issue some further thought tonight as well.

nrwahl2 · 2026-05-31T14:20:47Z

                     "crm_shadow --diff",
                     expected_rc=ExitStatus.ERROR),
            ], cib_gen=partial(copy_existing_cib, f"{cts_cli_data}/crm_mon.xml")),
+            TestGroup([


Can we add these tests before the fix commit, with expected_rc set to an error? Then update the expected_rc and the output in the fix commit

Test first - Ok. Just wasn't sure we were pushing for that. Will do.

As this (at least technically) isn't a change that fixes a behavior but rather a feature added I chose to add a test prior to the commit that checks if the - in this case - undesired behavior (status not comited) is there. There is probably not much sense in checking that the new feature is not there by checking if trying to use it gives us a failure.
Together with the feature-commit I'm then checking if the new feature can be used to achieve the desired behavior. Was hesitant if I should have left the test checking the behavior without the new switch.

nrwahl2 · 2026-05-31T14:31:39Z

                                "crm_attribute",
                                "crm_node",
                                "crm_resource",
+                                "crm_shadow_status",


We need the commit message to reference RHEL-70283, and to explain why we're doing this (since future contributors may not be able to access this RHEL Jira issue).

We've encountered numerous issues when commit messages don't explain the reasoning for the changes in the commit. We often end up, much later, removing or changing behavior that we depend on. At best, we spend a long time trying to figure out why things are the way they are, and we're often not confident in the conclusions we reach.

As it doesn't fix the issue right away I can add that being able to alter the status section is a pre-requisite for RHEL-70283 and that this commit gives us back this possibility.

Of course I would have loved to get some pcs feedback regarding the approach.
Which is why I put it here for now as something generically useful. But the reference to what triggered the implementation is probably still a good idea.

Added a bit of data that should help understand why the feature was added including a reference to the Jira Issue.

nrwahl2 · 2026-05-31T18:53:06Z

        return;
    }

+    if (options.update_status) {


I don't understand how this would fix the issue. Based on RHEL-70283, it appears that the issue is that adding devices to a fence_scsi or fence_mpath resource using pcs stonith update causes resource restarts. However, pcs doesn't use crm_shadow at all. So updating the crm_shadow CLI tool doesn't seem like it would affect the behavior of pcs.

It doesn't fix the issue right away. There is already RHELHA-1011 to take care of this from the pcs side. Just after your commit there was no low-level-tool left you could use to update the status section. Both for getting an atomic update and for the interface crm_shadow seems like the reasonable answer.

As since ec65417 there is no tool available anymore to modify the status section and crm_shadow would be the way to do that in an atomic way adding the switch --update-status. In favor to reverting to behavior before the commit above this approach doesn't introduce potentially unintentional behavioral changes. This was triggered by pcs feature that allows altering fence_scsi devices without resource restart altering the digests in already recorded actions in the status section. (Jira Issue RHEL-70283)

mirecheck · 2026-06-10T08:25:58Z

Would it be possible to implement similar option for the cibadmin tool? So that
the resource refresh does not occur after updating the status section.

Pcs uses diff-based push:

crm_diff --original <old_cib_file> --new <new_cib_file> --no-version
cibadmin --patch --verbose --xml-pipe (with diff XML on stdin)

It would be more easier for pcs to just add another option to the cibadmin call
than integrating crm_shadow.

wenningerk · 2026-06-10T10:32:08Z

Would it be possible to implement similar option for the cibadmin tool? So that the resource refresh does not occur after updating the status section.

Pcs uses diff-based push:

crm_diff --original <old_cib_file> --new <new_cib_file> --no-version

cibadmin --patch --verbose --xml-pipe (with diff XML on stdin)

It would be more easier for pcs to just add another option to the cibadmin call than integrating crm_shadow.

Basically there were 2 reasons to go for crm_shadow:

it is an atomic update (which might be the case for your diff as well if applied with a single cibadmin call - have to check)
adding the switch to crm_shadow kind of gives you the possibility to use any tooling on the status-section (of a shadow-cib then) with a minimal change to the code-base in pacemaker

From an architectural pov it would be definitely cleaner to have a little bit more code for this very - anyway hacky -
feature in the high-level-tooling instead of bloating the pacemaker code-base.
If using cibadmin reduces the risk of a race with a concurrent status-update that gets reverted with a full status-push it might be worth thinking over once again though - given of course the cibadmin --patch thing works in an atomic manner.

wenningerk force-pushed the modify_fence_scsi_devices branch from 62b194c to a09fa0c Compare May 20, 2026 14:37

wenningerk changed the title ~~[WIP] don't trigger restart of fence_scsi when devices are modified and start digest is adapted accordingly~~ allow the status-section to be updated on a crm_shadow commit with an explicit switch May 20, 2026

nrwahl2 reviewed May 31, 2026

View reviewed changes

wenningerk added 2 commits June 1, 2026 14:26

Test: crm_shadow: check that status-changes are not commited

f615b30

wenningerk force-pushed the modify_fence_scsi_devices branch from a09fa0c to 73e0e5c Compare June 1, 2026 12:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow the status-section to be updated on a crm_shadow commit with an explicit switch#4106

allow the status-section to be updated on a crm_shadow commit with an explicit switch#4106
wenningerk wants to merge 2 commits into
ClusterLabs:mainfrom
wenningerk:modify_fence_scsi_devices

wenningerk commented May 7, 2026 •

edited

Loading

Uh oh!

nrwahl2 left a comment

Uh oh!

nrwahl2 May 31, 2026

Uh oh!

wenningerk Jun 1, 2026

Uh oh!

wenningerk Jun 1, 2026

Uh oh!

nrwahl2 May 31, 2026

Uh oh!

wenningerk Jun 1, 2026

Uh oh!

wenningerk Jun 1, 2026

Uh oh!

wenningerk Jun 1, 2026

Uh oh!

nrwahl2 May 31, 2026

Uh oh!

wenningerk Jun 1, 2026

Uh oh!

mirecheck commented Jun 10, 2026

Uh oh!

wenningerk commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wenningerk commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nrwahl2 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mirecheck commented Jun 10, 2026

Uh oh!

wenningerk commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wenningerk commented May 7, 2026 •

edited

Loading