Skip to content

Multiple CMD probes share stale configuration when source image or run properties differ #5462

@pcandido

Description

@pcandido

What happened

When a ChaosEngine is configured with multiple cmdProbe entries, all probes end up with the same configuration from the last probe in the list. This affects:

  • source.image — all probe pods are created using the last probe's image
  • runProperties fields with omitempty (retry, attempt, stopOnFailure, probePollingInterval, initialDelay, evaluationTimeout) — values defined in an earlier probe but omitted in a later one carry over to the later probe

What you expected to happen

Each CMD probe should use its own source.image and its own runProperties values, independently of every other probe in the list.

How to reproduce it

Configure two cmdProbe entries using images that each provide an exclusive command absent in the other:

  • curlimages/curl:latest — has curl, does not have python3
  • python:3.12-alpine — has python3, does not have curl

If the wrong image is used, the command fails with exit 127 (command not found) — which is exactly the observable symptom.

probe:
  - name: probe-curl
    type: cmdProbe
    mode: SOT
    cmdProbe/inputs:
      command: "curl --version"
      comparator:
        type: string
        criteria: contains
        value: "curl"
      source:
        image: curlimages/curl:latest
    runProperties:
      probeTimeout: 60s
      interval: 10s
      retry: 3
      stopOnFailure: true

  - name: probe-python
    type: cmdProbe
    mode: EOT
    cmdProbe/inputs:
      command: "python3 --version"
      comparator:
        type: string
        criteria: contains
        value: "Python"
      source:
        image: python:3.12-alpine
    runProperties:
      probeTimeout: 30s
      interval: 5s
      # retry and stopOnFailure intentionally omitted

Expected:
Both probes pass successfully.

Actual:

{"errorCode":"GENERIC_ERROR","phase":"PreChaos","reason":"failed to create a stderr and stdout stream, command terminated with exit code 127"}

Root cause

File: chaoscenter/graphql/server/pkg/probe/handler/handler.go
Functions: GenerateExperimentManifestWithProbes, GenerateCronExperimentManifestWithProbes

A single cmdProbe CMDProbeAttributes variable is declared outside the inner probe loop and reused across all iterations:

var cmdProbe CMDProbeAttributes // declared once

for _, annotationKey := range manifestAnnotation {
    json.Unmarshal([]byte(probeManifestString), &cmdProbe) // reuses same variable

    probes = append(probes, v1alpha1.ProbeAttributes{
        CmdProbeInputs: &v1alpha1.CmdProbeInputs{
            Source: cmdProbe.CmdProbeInputs.Source, // pointer copy — not a new allocation
        },
        RunProperties: cmdProbe.RunProperties,
    })
}

This causes two distinct issues:

Issue 1 — Pointer aliasing (source.image)

CmdProbeInputs.Source is *SourceDetails (a pointer). When json.Unmarshal encounters a non-nil pointer it reuses the existing allocation and updates its fields in-place. Every entry appended to probes shares the same *SourceDetails address, so after the last iteration all probes reflect the last probe's image.

Issue 2 — Stale values from omitempty fields (runProperties)

RunProperties fields such as retry, attempt, stopOnFailure, probePollingInterval, initialDelay, and evaluationTimeout are serialized with omitempty. When a later probe omits one of these fields, json.Unmarshal simply skips it — leaving the value from the previous iteration intact in the shared variable. The stale value is then copied into the new probe's RunProperties.

Minimal reproduction (no Litmus dependencies)

package main

import (
    "encoding/json"
    "fmt"
)

type Source struct {
    Image string `json:"image,omitempty"`
}
type RunProps struct {
    Timeout string `json:"timeout"`
    Retry   int    `json:"retry,omitempty"` // omitempty — absent in JSON = not zeroed
}
type Inputs struct {
    Source *Source  `json:"source,omitempty"`
}
type Probe struct {
    Name       string   `json:"name"`
    Inputs     Inputs   `json:"inputs"`
    RunProps   RunProps `json:"runProps"`
}

func main() {
    probeJSONs := []string{
        `{"name":"probe-A","inputs":{"source":{"image":"image-A:latest"}},"runProps":{"timeout":"60s","retry":3}}`,
        `{"name":"probe-B","inputs":{"source":{"image":"image-B:latest"}},"runProps":{"timeout":"30s"}}`,
    }

    var shared Probe
    type result struct {
        name    string
        srcPtr  *Source
        retry   int
    }
    var results []result

    for _, raw := range probeJSONs {
        json.Unmarshal([]byte(raw), &shared)
        results = append(results, result{shared.Name, shared.Inputs.Source, shared.RunProps.Retry})
    }

    for _, r := range results {
        fmt.Printf("probe=%-9s image=%-20s retry=%d  ptr=%p\n",
            r.name, r.srcPtr.Image, r.retry, r.srcPtr)
    }
}

Output:

probe=probe-A   image=image-B:latest       retry=3  ptr=0x140001220a0
probe=probe-B   image=image-B:latest       retry=3  ptr=0x140001220a0

Both issues are visible: same pointer address (Issue 1) and retry=3 leaked into probe-B (Issue 2).

Proposed fix

Declare each probe variable inside its own if block instead of sharing a single instance across loop iterations. Go guarantees a zero value on every declaration, which nullifies the Source pointer and clears all omitempty fields — no stale data can leak between probes.

Before (buggy):

var cmdProbe CMDProbeAttributes // shared across all iterations

for _, annotationKey := range manifestAnnotation {
    // ...
    } else if model.ProbeType(probe.Type) == model.ProbeTypeCmdProbe {
        json.Unmarshal([]byte(probeManifestString), &cmdProbe) // reuses allocation
        // ...
    }
}

After (fixed):

for _, annotationKey := range manifestAnnotation {
    // ...
    } else if model.ProbeType(probe.Type) == model.ProbeTypeCmdProbe {
        var cmdProbe CMDProbeAttributes // fresh zero value every iteration
        json.Unmarshal([]byte(probeManifestString), &cmdProbe)
        // ...
    }
}

The same change applies to httpProbe, promProbe, and k8sProbe in both GenerateExperimentManifestWithProbes and GenerateCronExperimentManifestWithProbes, and the shared var (...) block at the top of each function should be trimmed accordingly.

Anything else

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions