-
Notifications
You must be signed in to change notification settings - Fork 844
Description
What happened
When a ChaosEngine is configured with multiple cmdProbe entries, all probes end up with the same configuration from the last probe in the list. This affects:
source.image— all probe pods are created using the last probe's imagerunPropertiesfields withomitempty(retry,attempt,stopOnFailure,probePollingInterval,initialDelay,evaluationTimeout) — values defined in an earlier probe but omitted in a later one carry over to the later probe
What you expected to happen
Each CMD probe should use its own source.image and its own runProperties values, independently of every other probe in the list.
How to reproduce it
Configure two cmdProbe entries using images that each provide an exclusive command absent in the other:
curlimages/curl:latest— hascurl, does not havepython3python:3.12-alpine— haspython3, does not havecurl
If the wrong image is used, the command fails with exit 127 (command not found) — which is exactly the observable symptom.
probe:
- name: probe-curl
type: cmdProbe
mode: SOT
cmdProbe/inputs:
command: "curl --version"
comparator:
type: string
criteria: contains
value: "curl"
source:
image: curlimages/curl:latest
runProperties:
probeTimeout: 60s
interval: 10s
retry: 3
stopOnFailure: true
- name: probe-python
type: cmdProbe
mode: EOT
cmdProbe/inputs:
command: "python3 --version"
comparator:
type: string
criteria: contains
value: "Python"
source:
image: python:3.12-alpine
runProperties:
probeTimeout: 30s
interval: 5s
# retry and stopOnFailure intentionally omittedExpected:
Both probes pass successfully.
Actual:
{"errorCode":"GENERIC_ERROR","phase":"PreChaos","reason":"failed to create a stderr and stdout stream, command terminated with exit code 127"}
Root cause
File: chaoscenter/graphql/server/pkg/probe/handler/handler.go
Functions: GenerateExperimentManifestWithProbes, GenerateCronExperimentManifestWithProbes
A single cmdProbe CMDProbeAttributes variable is declared outside the inner probe loop and reused across all iterations:
var cmdProbe CMDProbeAttributes // declared once
for _, annotationKey := range manifestAnnotation {
json.Unmarshal([]byte(probeManifestString), &cmdProbe) // reuses same variable
probes = append(probes, v1alpha1.ProbeAttributes{
CmdProbeInputs: &v1alpha1.CmdProbeInputs{
Source: cmdProbe.CmdProbeInputs.Source, // pointer copy — not a new allocation
},
RunProperties: cmdProbe.RunProperties,
})
}This causes two distinct issues:
Issue 1 — Pointer aliasing (source.image)
CmdProbeInputs.Source is *SourceDetails (a pointer). When json.Unmarshal encounters a non-nil pointer it reuses the existing allocation and updates its fields in-place. Every entry appended to probes shares the same *SourceDetails address, so after the last iteration all probes reflect the last probe's image.
Issue 2 — Stale values from omitempty fields (runProperties)
RunProperties fields such as retry, attempt, stopOnFailure, probePollingInterval, initialDelay, and evaluationTimeout are serialized with omitempty. When a later probe omits one of these fields, json.Unmarshal simply skips it — leaving the value from the previous iteration intact in the shared variable. The stale value is then copied into the new probe's RunProperties.
Minimal reproduction (no Litmus dependencies)
package main
import (
"encoding/json"
"fmt"
)
type Source struct {
Image string `json:"image,omitempty"`
}
type RunProps struct {
Timeout string `json:"timeout"`
Retry int `json:"retry,omitempty"` // omitempty — absent in JSON = not zeroed
}
type Inputs struct {
Source *Source `json:"source,omitempty"`
}
type Probe struct {
Name string `json:"name"`
Inputs Inputs `json:"inputs"`
RunProps RunProps `json:"runProps"`
}
func main() {
probeJSONs := []string{
`{"name":"probe-A","inputs":{"source":{"image":"image-A:latest"}},"runProps":{"timeout":"60s","retry":3}}`,
`{"name":"probe-B","inputs":{"source":{"image":"image-B:latest"}},"runProps":{"timeout":"30s"}}`,
}
var shared Probe
type result struct {
name string
srcPtr *Source
retry int
}
var results []result
for _, raw := range probeJSONs {
json.Unmarshal([]byte(raw), &shared)
results = append(results, result{shared.Name, shared.Inputs.Source, shared.RunProps.Retry})
}
for _, r := range results {
fmt.Printf("probe=%-9s image=%-20s retry=%d ptr=%p\n",
r.name, r.srcPtr.Image, r.retry, r.srcPtr)
}
}Output:
probe=probe-A image=image-B:latest retry=3 ptr=0x140001220a0
probe=probe-B image=image-B:latest retry=3 ptr=0x140001220a0
Both issues are visible: same pointer address (Issue 1) and retry=3 leaked into probe-B (Issue 2).
Proposed fix
Declare each probe variable inside its own if block instead of sharing a single instance across loop iterations. Go guarantees a zero value on every declaration, which nullifies the Source pointer and clears all omitempty fields — no stale data can leak between probes.
Before (buggy):
var cmdProbe CMDProbeAttributes // shared across all iterations
for _, annotationKey := range manifestAnnotation {
// ...
} else if model.ProbeType(probe.Type) == model.ProbeTypeCmdProbe {
json.Unmarshal([]byte(probeManifestString), &cmdProbe) // reuses allocation
// ...
}
}After (fixed):
for _, annotationKey := range manifestAnnotation {
// ...
} else if model.ProbeType(probe.Type) == model.ProbeTypeCmdProbe {
var cmdProbe CMDProbeAttributes // fresh zero value every iteration
json.Unmarshal([]byte(probeManifestString), &cmdProbe)
// ...
}
}The same change applies to httpProbe, promProbe, and k8sProbe in both GenerateExperimentManifestWithProbes and GenerateCronExperimentManifestWithProbes, and the shared var (...) block at the top of each function should be trimmed accordingly.