tools: codegen: increase sippy request timeout for feature promotion verification#2782
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
Hello @everettraven! Some important instructions when contributing to openshift/api: |
|
Important Review skippedAuto reviews are limited based on label configuration. 🚫 Review skipped — only excluded labels are configured. (1)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository YAML (base), Organization UI (inherited) Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThe PR increases the Sippy API per-request timeout from 30s to 2m. It moves the ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
Want to cross-test this. /hold |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tools/codegen/cmd/featuregate-test-analyzer.go (1)
836-870:⚠️ Potential issue | 🟠 MajorBind Sippy requests to command context to prevent hanging under timeout conditions
The command-level timeout (10 seconds) is not propagated to HTTP requests in
listTestResultForVariant. Requests are created withhttp.NewRequest()instead ofhttp.NewRequestWithContext(), so they ignore the command's cancellation signal. Combined with the 2-minute HTTP client timeout, this allows stalled requests to block the analyzer far longer than intended.Required changes
-func listTestResultFor(featureGate string, clusterProfiles sets.Set[string]) (map[JobVariant]*TestingResults, error) { +func listTestResultFor(ctx context.Context, featureGate string, clusterProfiles sets.Set[string]) (map[JobVariant]*TestingResults, error) { @@ - jobVariantResults, err := listTestResultForVariant(featureGate, jobVariant) + jobVariantResults, err := listTestResultForVariant(ctx, featureGate, jobVariant) @@ -func listTestResultForVariant(featureGate string, jobVariant JobVariant) (*TestingResults, error) { +func listTestResultForVariant(ctx context.Context, featureGate string, jobVariant JobVariant) (*TestingResults, error) { @@ - req, err := http.NewRequest(http.MethodGet, currURL.String(), nil) + req, err := http.NewRequestWithContext(ctx, http.MethodGet, currURL.String(), nil)- testingResults, err := listTestResultFor(enabledFeatureGate, clusterProfiles) + testingResults, err := listTestResultFor(ctx, enabledFeatureGate, clusterProfiles)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/codegen/cmd/featuregate-test-analyzer.go` around lines 836 - 870, In listTestResultForVariant, HTTP requests are created with http.NewRequest so they don't observe the command's cancellation; replace http.NewRequest(...) with http.NewRequestWithContext(ctx, ...) (or the command's context variable name used in this function) and ensure the function signature or surrounding scope provides that context. Use this context when creating each request before calling sippyClient.Do(req) so timeouts/cancellations at the command level are honored for the sippyClient calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@tools/codegen/cmd/featuregate-test-analyzer.go`:
- Around line 836-870: In listTestResultForVariant, HTTP requests are created
with http.NewRequest so they don't observe the command's cancellation; replace
http.NewRequest(...) with http.NewRequestWithContext(ctx, ...) (or the command's
context variable name used in this function) and ensure the function signature
or surrounding scope provides that context. Use this context when creating each
request before calling sippyClient.Do(req) so timeouts/cancellations at the
command level are honored for the sippyClient calls.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 6bf8c284-ee0a-40bc-87c8-e02345a875ba
📒 Files selected for processing (1)
tools/codegen/cmd/featuregate-test-analyzer.go
…verification Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
612919a to
1aa1ed4
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@payload-manifests/crds/0000_80_machine-config_01_machineconfignodes-OKD.crd.yaml`:
- Around line 123-125: The CEL regex in the rule string (the pattern starting
with '^([a-zA-Z0-9-]+\\.)+...') allows an empty repository namespace and empty
image name (it uses {0,61} and *), letting invalid refs like
"registry.example.com/@sha256:..." slip through; update the pattern used in the
rule entries so the namespace segment is {1,61} (not {0,61}) and require a
non-empty image name (replace the trailing '*$' segment with a '+'-style
requirement, e.g. change [a-zA-Z0-9-_.]*?$ to [a-zA-Z0-9-_.]+?$), and apply this
exact replacement to all three occurrences of that rule in the file (the rule
lines containing self.split('@')[0].matches('^([a-zA-Z0-9-]+\\.)+[...')).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 81671c20-9cab-4023-959e-7e87fcc0d967
⛔ Files ignored due to path filters (2)
machineconfiguration/v1/zz_generated.crd-manifests/0000_80_machine-config_01_machineconfignodes-Default.crd.yamlis excluded by!**/zz_generated.crd-manifests/*machineconfiguration/v1/zz_generated.crd-manifests/0000_80_machine-config_01_machineconfignodes-OKD.crd.yamlis excluded by!**/zz_generated.crd-manifests/*
📒 Files selected for processing (9)
features.mdfeatures/features.gopayload-manifests/crds/0000_80_machine-config_01_machineconfignodes-Default.crd.yamlpayload-manifests/crds/0000_80_machine-config_01_machineconfignodes-OKD.crd.yamlpayload-manifests/featuregates/featureGate-4-10-Hypershift-Default.yamlpayload-manifests/featuregates/featureGate-4-10-Hypershift-OKD.yamlpayload-manifests/featuregates/featureGate-4-10-SelfManagedHA-Default.yamlpayload-manifests/featuregates/featureGate-4-10-SelfManagedHA-OKD.yamltools/codegen/cmd/featuregate-test-analyzer.go
✅ Files skipped from review due to trivial changes (2)
- tools/codegen/cmd/featuregate-test-analyzer.go
- features.md
🚧 Files skipped from review as they are similar to previous changes (4)
- payload-manifests/featuregates/featureGate-4-10-Hypershift-OKD.yaml
- payload-manifests/featuregates/featureGate-4-10-SelfManagedHA-Default.yaml
- payload-manifests/featuregates/featureGate-4-10-SelfManagedHA-OKD.yaml
- payload-manifests/featuregates/featureGate-4-10-Hypershift-Default.yaml
| - message: the OCI Image name should follow the host[:port][/namespace]/name | ||
| format, resembling a valid URL without the scheme | ||
| rule: (self.split('@')[0].matches('^([a-zA-Z0-9-]+\\.)+[a-zA-Z0-9-]+(:[0-9]{2,5})?/([a-zA-Z0-9-_]{0,61}/)?[a-zA-Z0-9-_.]*?$')) |
There was a problem hiding this comment.
Tighten image-name CEL regex; it currently accepts invalid empty repo paths.
At Line 125, Line 300, and Line 318, the pattern allows empty repository names ({0,61} namespace + * name). Invalid refs like registry.example.com/@sha256:<digest> can pass schema validation and fail later during reconciliation.
Proposed regex tightening
- rule: (self.split('@')[0].matches('^([a-zA-Z0-9-]+\\.)+[a-zA-Z0-9-]+(:[0-9]{2,5})?/([a-zA-Z0-9-_]{0,61}/)?[a-zA-Z0-9-_.]*?$'))
+ rule: (self.split('@')[0].matches('^([a-zA-Z0-9-]+\\.)+[a-zA-Z0-9-]+(:[0-9]{2,5})?/([a-zA-Z0-9-_]{1,61}/)?[a-zA-Z0-9-_.]+$'))Apply the same replacement to all three occurrences in this file.
Also applies to: 298-300, 316-318
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@payload-manifests/crds/0000_80_machine-config_01_machineconfignodes-OKD.crd.yaml`
around lines 123 - 125, The CEL regex in the rule string (the pattern starting
with '^([a-zA-Z0-9-]+\\.)+...') allows an empty repository namespace and empty
image name (it uses {0,61} and *), letting invalid refs like
"registry.example.com/@sha256:..." slip through; update the pattern used in the
rule entries so the namespace segment is {1,61} (not {0,61}) and require a
non-empty image name (replace the trailing '*$' segment with a '+'-style
requirement, e.g. change [a-zA-Z0-9-_.]*?$ to [a-zA-Z0-9-_.]+?$), and apply this
exact replacement to all three occurrences of that rule in the file (the rule
lines containing self.split('@')[0].matches('^([a-zA-Z0-9-]+\\.)+[...')).
|
Changing the timeout seems ok to me, push without the proof and I'll tag up for merge |
1aa1ed4 to
99aafa5
Compare
|
/lgtm |
|
Pipeline controller notification No second-stage tests were triggered for this PR. This can happen when:
Use |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JoelSpeed The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
tooling only change, no product impact. /verified bypass |
|
@everettraven: The DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/hold cancel |
|
@everettraven: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
We've seen a few instances of Sippy serving slow responses when performing feature promotion verification checks causing the verification check to fail due to exceeding the 30 second timeout.
A recent example of this being https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_api/2738/pull-ci-openshift-api-master-verify-feature-promotion/2037154787757133824
While we don't want to wait for a response from Sippy for forever, it seems like it is becoming a bit more frequent that some sippy queries take more time to receive a response. To be a bit more resilient to slow Sippy responses while trying to balance runtime, I've bumped it from 30 seconds -> 2 minutes for a client timeout. This 4X increase should help us be more resilient to these rarer cases of slow Sippy responses without making worst case runtime (which should not occur often) too bad.