MON-4036: Add NodeExporterConfig to ClusterMonitoring API#2744
MON-4036: Add NodeExporterConfig to ClusterMonitoring API#2744danielmellado wants to merge 1 commit intoopenshift:masterfrom
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@danielmellado: This pull request references MON-4036 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Hello @danielmellado! Some important instructions when contributing to openshift/api: |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new optional 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Review Summary by QodoAdd NodeExporterConfig to ClusterMonitoring API with collector configuration support
WalkthroughsDescription• Added NodeExporterConfig to ClusterMonitoringSpec with comprehensive configuration support for the node-exporter agent • Implemented 10 individual metric collectors (buddyInfo, cpuFreq, ethtool, ksmd, mountStats, netClass, netDev, processes, systemd, tcpStat) with per-collector enable/disable toggles and optional configuration • Added support for nodeSelector, resources, tolerations, maxProcs, and ignoredNetworkDevices configuration options • Generated OpenAPI schemas, Swagger documentation, and deepcopy methods for all new node-exporter configuration types • Updated CRD manifests (main, feature-gated, and payload) with complete schema definitions and validation rules • Added 13 comprehensive test cases validating both valid configurations and constraint violations Diagramflowchart LR
A["ClusterMonitoringSpec"] -->|"adds field"| B["NodeExporterConfig"]
B -->|"configures"| C["Node-Exporter Agent"]
B -->|"supports"| D["10 Collectors"]
B -->|"supports"| E["Pod Configuration"]
D -->|"examples"| F["buddyInfo, cpuFreq, ethtool, ksmd, mountStats"]
E -->|"includes"| G["nodeSelector, resources, tolerations, maxProcs"]
File Changes1. config/v1alpha1/types_cluster_monitoring.go
|
Code Review by Qodo
1. nodeExporterConfig not feature-gated
|
...g/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings.crd.yaml
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1)
661-789: Add a contract test forignoredNetworkDevices: [].The new suite does not assert the documented empty-list behavior for
ignoredNetworkDevices. A dedicated case will prevent schema/docs drift from slipping through.🧪 Suggested test case
+ - name: Should be able to create NodeExporterConfig with empty ignoredNetworkDevices + initial: | + apiVersion: config.openshift.io/v1alpha1 + kind: ClusterMonitoring + spec: + nodeExporterConfig: + ignoredNetworkDevices: [] + expected: | + apiVersion: config.openshift.io/v1alpha1 + kind: ClusterMonitoring + spec: + nodeExporterConfig: + ignoredNetworkDevices: []🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml` around lines 661 - 789, Add a new contract test case for NodeExporterConfig to assert the documented behavior when ignoredNetworkDevices is an empty array: create a test entry (similar to the existing cases) named something like "Should accept NodeExporterConfig with empty ignoredNetworkDevices" where spec.nodeExporterConfig.ignoredNetworkDevices: [] in the initial YAML and the expected output mirrors that (or no expectedError), ensuring the test references NodeExporterConfig and ignoredNetworkDevices so the suite validates empty-list handling and prevents schema/docs drift.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@config/v1alpha1/types_cluster_monitoring.go`:
- Around line 655-668: The comment and validation for ignoredNetworkDevices
conflict: the docs state an empty list is valid (meaning "no devices are
excluded") but the kubebuilder tag +kubebuilder:validation:MinItems=1 prevents
[] from being used. Fix by removing or relaxing the MinItems constraint on
ignoredNetworkDevices (i.e., delete the +kubebuilder:validation:MinItems=1 tag
or change it to allow 0) so an empty slice is accepted; keep the MaxItems
constraint and listType as-is and update any unit tests or schema consumers if
they assume MinItems=1.
In
`@payload-manifests/crds/0000_10_config-operator_01_clustermonitorings.crd.yaml`:
- Around line 1522-1541: The schema currently only enforces length/count for
systemd.units[] (NodeExporterSystemdUnit) and ignoredNetworkDevices[]; add
OpenAPI pattern validation to each items schema to ensure values are valid
regular expressions (e.g., add a "pattern" field under the items for
systemd.units and for ignoredNetworkDevices that encodes the allowed regex
syntax you want to accept—preferably aligned to the RE2 flavor used downstream),
keep maxLength/minLength and descriptions, and update the
NodeExporterSystemdUnit/ignoredNetworkDevices items to reference the new pattern
so invalid regex strings are rejected by the CRD validation.
- Around line 1568-1590: The description for ignoredNetworkDevices says an empty
list is allowed but the schema sets minItems: 1; update the schema to match the
description by changing the minItems constraint for ignoredNetworkDevices to 0
(or remove the minItems line) so an empty array is accepted, and keep maxItems:
50 and the items block (NodeExporterIgnoredNetworkDevice,
maxLength/minLength/type) unchanged; ensure any references or validation that
rely on minItems in code or docs are adjusted accordingly.
---
Nitpick comments:
In
`@config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml`:
- Around line 661-789: Add a new contract test case for NodeExporterConfig to
assert the documented behavior when ignoredNetworkDevices is an empty array:
create a test entry (similar to the existing cases) named something like "Should
accept NodeExporterConfig with empty ignoredNetworkDevices" where
spec.nodeExporterConfig.ignoredNetworkDevices: [] in the initial YAML and the
expected output mirrors that (or no expectedError), ensuring the test references
NodeExporterConfig and ignoredNetworkDevices so the suite validates empty-list
handling and prevents schema/docs drift.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f41b0979-fba6-4d38-bfe5-c83eb2d1eb31
⛔ Files ignored due to path filters (3)
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings.crd.yamlis excluded by!**/zz_generated.crd-manifests/*config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**openapi/generated_openapi/zz_generated.openapi.gois excluded by!openapi/**
📒 Files selected for processing (5)
config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yamlconfig/v1alpha1/types_cluster_monitoring.goconfig/v1alpha1/zz_generated.deepcopy.goconfig/v1alpha1/zz_generated.swagger_doc_generated.gopayload-manifests/crds/0000_10_config-operator_01_clustermonitorings.crd.yaml
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings.crd.yaml
Show resolved
Hide resolved
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings.crd.yaml
Show resolved
Hide resolved
There was a problem hiding this comment.
- qodo's finding about a missing FeatureGate is a false positive. The entire ClusterMonitoring CRD is already gated behind
ClusterMonitoringConfig(line 39), so no per-field gate is needed. - both bots suggest adding OpenAPI pattern validation to enforce regex syntax on
ignoredNetworkDevicesandsystemd.unitsentries. This is not feasible; there is no OpenAPI pattern or CEL function that can validate whether a string is a valid regex. The realistic fix is to soften the documentation language (see inline comment).
a54edbe to
98a2cb7
Compare
|
Thanks for the review! Good catch on the MinItems=1 contradiction! I initially tried just removing the marker, but make lint isn't happy without it: So I went with It explicitly allows empty lists, keeps the linter happy, and the documented "When set as an empty list, no devices are excluded" behavior now works as expected. Also added a test for |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
config/v1alpha1/types_cluster_monitoring.go (1)
652-655:⚠️ Potential issue | 🟡 Minor
maxProcs: 0still serializes as “omitted” for typed clients.With
json:"maxProcs,omitempty", a generated Go client cannot send an explicit0; it gets dropped before theMinimum=1validation runs. If the API contract intends to reject0instead of silently taking the default, this field needs presence tracking (for example*int32), and the new NodeExporter coverage should include amaxProcs: 0case once that is fixed.In Go's JSON encoding, does a field tagged `json:"maxProcs,omitempty"` omit a zero-valued `int32`, making an explicit `0` indistinguishable from an omitted field?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@config/v1alpha1/types_cluster_monitoring.go` around lines 652 - 655, Summary: a plain int32 with `json:"...,omitempty"` hides explicit 0 values, so change MaxProcs to a pointer type to track presence. Replace the MaxProcs field type from int32 to *int32 (i.e., MaxProcs *int32) in the ClusterMonitoring spec so callers can send explicit 0 vs omitted; keep or adjust the kubebuilder validation tags to apply to the pointer as needed and ensure validation still enforces Minimum=1 when present. Update any related code paths that dereference MaxProcs (e.g., validators, defaults) to handle nil vs non-nil, and add a NodeExporter test case that sends maxProcs: 0 to verify the API rejects or accepts per the contract.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@config/v1alpha1/types_cluster_monitoring.go`:
- Around line 656-670: Replace the json tag on the IgnoredNetworkDevices field
in types_cluster_monitoring.go so a non-nil empty slice is preserved (use the
omitzero tag supported in Go 1.24+); i.e., change the struct tag for
IgnoredNetworkDevices to include ",omitzero" (so nil slices are omitted but
non-nil empty slices marshal to []), and add a typed round-trip regression test
(e.g., TestRoundTripIgnoredNetworkDevices) that constructs a CR with
IgnoredNetworkDevices set to a non-nil empty slice, kube-type
serializes/deserializes it, and asserts the field remains an empty slice (not
nil) to verify the semantic difference is preserved.
---
Duplicate comments:
In `@config/v1alpha1/types_cluster_monitoring.go`:
- Around line 652-655: Summary: a plain int32 with `json:"...,omitempty"` hides
explicit 0 values, so change MaxProcs to a pointer type to track presence.
Replace the MaxProcs field type from int32 to *int32 (i.e., MaxProcs *int32) in
the ClusterMonitoring spec so callers can send explicit 0 vs omitted; keep or
adjust the kubebuilder validation tags to apply to the pointer as needed and
ensure validation still enforces Minimum=1 when present. Update any related code
paths that dereference MaxProcs (e.g., validators, defaults) to handle nil vs
non-nil, and add a NodeExporter test case that sends maxProcs: 0 to verify the
API rejects or accepts per the contract.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 5ab267a4-337e-406f-b418-aad8042b31a1
⛔ Files ignored due to path filters (6)
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings.crd.yamlis excluded by!**/zz_generated.crd-manifests/*config/v1alpha1/zz_generated.deepcopy.gois excluded by!**/zz_generated*config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**config/v1alpha1/zz_generated.swagger_doc_generated.gois excluded by!**/zz_generated*openapi/generated_openapi/zz_generated.openapi.gois excluded by!openapi/**,!**/zz_generated*openapi/openapi.jsonis excluded by!openapi/**
📒 Files selected for processing (3)
config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yamlconfig/v1alpha1/types_cluster_monitoring.gopayload-manifests/crds/0000_10_config-operator_01_clustermonitorings.crd.yaml
98a2cb7 to
4a67cf2
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@config/v1alpha1/types_cluster_monitoring.go`:
- Around line 667-671: The JSON struct tag for the IgnoredNetworkDevices field
currently uses `json:"ignoredNetworkDevices,omitempty"`, which causes non-nil
empty slices to be omitted and prevents Go clients from expressing an explicit
empty list; update the tag on the IgnoredNetworkDevices field (type
NodeExporterIgnoredNetworkDevice) to
`json:"ignoredNetworkDevices,omitempty,omitzero"` so non-nil empty slices
marshal as [] while preserving backward compatibility with omitted nil slices.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 8a1585c7-2ece-47bc-9d61-8ed9d50910c3
⛔ Files ignored due to path filters (4)
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings.crd.yamlis excluded by!**/zz_generated.crd-manifests/*config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**openapi/generated_openapi/zz_generated.openapi.gois excluded by!openapi/**openapi/openapi.jsonis excluded by!openapi/**
📒 Files selected for processing (5)
config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yamlconfig/v1alpha1/types_cluster_monitoring.goconfig/v1alpha1/zz_generated.deepcopy.goconfig/v1alpha1/zz_generated.swagger_doc_generated.gopayload-manifests/crds/0000_10_config-operator_01_clustermonitorings.crd.yaml
4a67cf2 to
d58b16d
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings.crd.yaml (1)
1520-1541:⚠️ Potential issue | 🟠 MajorRegex inputs are still accepted without schema-level validation
systemd.units[]andignoredNetworkDevices[]are documented as regex patterns, but the CRD schema only validates type/length. Invalid regex values can still be admitted and only fail later in controller reconciliation. Please add admission-time validation for regex compilability to fail fast.Also applies to: 1580-1588
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@payload-manifests/crds/0000_10_config-operator_01_clustermonitorings.crd.yaml` around lines 1520 - 1541, Add admission-time validation so regex entries are rejected if they don't compile: for the systemd.units[] items (NodeExporterSystemdUnit) and ignoredNetworkDevices[] items, add x-kubernetes-validations CEL rules on the string item schema that exercise regex compilation (e.g. a CEL rule that calls matches with a constant like self.matches("") or equivalent) so invalid regex patterns fail admission instead of being accepted by type/length-only checks; update both the items blocks for NodeExporterSystemdUnit and the corresponding ignoredNetworkDevices item schema accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In
`@payload-manifests/crds/0000_10_config-operator_01_clustermonitorings.crd.yaml`:
- Around line 1520-1541: Add admission-time validation so regex entries are
rejected if they don't compile: for the systemd.units[] items
(NodeExporterSystemdUnit) and ignoredNetworkDevices[] items, add
x-kubernetes-validations CEL rules on the string item schema that exercise regex
compilation (e.g. a CEL rule that calls matches with a constant like
self.matches("") or equivalent) so invalid regex patterns fail admission instead
of being accepted by type/length-only checks; update both the items blocks for
NodeExporterSystemdUnit and the corresponding ignoredNetworkDevices item schema
accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 4f705a8f-5f6c-42c4-8faf-87b5822c3ff7
⛔ Files ignored due to path filters (3)
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings.crd.yamlis excluded by!**/zz_generated.crd-manifests/*config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**openapi/generated_openapi/zz_generated.openapi.gois excluded by!openapi/**
📒 Files selected for processing (5)
config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yamlconfig/v1alpha1/types_cluster_monitoring.goconfig/v1alpha1/zz_generated.deepcopy.goconfig/v1alpha1/zz_generated.swagger_doc_generated.gopayload-manifests/crds/0000_10_config-operator_01_clustermonitorings.crd.yaml
saschagrunert
left a comment
There was a problem hiding this comment.
Two new comments on undocumented cross-field interactions. The previously acknowledged items (omitzero on ignoredNetworkDevices, min length doc) are still pending a push.
|
Yes, was about to push ;) Let me add these comments and regenerate. Thanks! |
d58b16d to
dc35dac
Compare
|
@danielmellado: This pull request references MON-4036 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target either version "4.22." or "openshift-4.22.", but it targets "openshift-5.0" instead. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
♻️ Duplicate comments (2)
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings.crd.yaml (1)
1535-1543:⚠️ Potential issue | 🟠 MajorReject invalid regex values at admission time
systemd.units[](Line 1539) andignoredNetworkDevices[](Line 1594) still only validate length/type, so malformed regex strings can be admitted and then fail later at controller reconcile time.Proposed schema-level validation
items: description: |- NodeExporterSystemdUnit is a string that is interpreted as a Go regular expression pattern by the controller to match systemd unit names. Invalid regular expressions will cause a controller-level error at runtime. Must be at least 1 character and at most 1024 characters. maxLength: 1024 minLength: 1 type: string + x-kubernetes-validations: + - message: must be a valid regular expression + rule: '"".matches(self) || !"".matches(self)' @@ items: description: |- NodeExporterIgnoredNetworkDevice is a string that is interpreted as a Go regular expression pattern by the controller to match network device names to exclude from node-exporter metric collection for collectors such as netdev, netclass, and ethtool. Invalid regular expressions will cause a controller-level error at runtime. Must be at least 1 character and at most 1024 characters. maxLength: 1024 minLength: 1 type: string + x-kubernetes-validations: + - message: must be a valid regular expression + rule: '"".matches(self) || !"".matches(self)'Please verify this CEL pattern behavior against your target kube-apiserver version; if not supported, enforce regex parsing in admission/controller before persisting invalid specs. As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
Also applies to: 1589-1598
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@payload-manifests/crds/0000_10_config-operator_01_clustermonitorings.crd.yaml` around lines 1535 - 1543, The CRD currently only enforces min/max length for NodeExporterSystemdUnit strings (used by systemd.units[] and ignoredNetworkDevices[]) so invalid regexes can be persisted and later fail; update validation to reject malformed regex at admission by adding a schema-level CEL (or OpenAPI x-kubernetes-validations/CEL) expression that attempts to compile the regex (or uses a safe pattern check supported by your target kube-apiserver) for both fields (systemd.units and ignoredNetworkDevices) so only valid regex strings are admitted, or if CEL is unsupported for your kube-apiserver, add admission/validation logic in the controller/admission webhook that parses/compiles NodeExporterSystemdUnit values and returns a rejection with a clear error when compilation fails.config/v1alpha1/types_cluster_monitoring.go (1)
292-306:⚠️ Potential issue | 🟠 MajorDrop
omitemptyfromignoredNetworkDevicesto preserve explicit[].Line 298 gives
[]different semantics from omission, but Line 306 still usesomitempty, which drops non-nil empty slices during marshaling. That makes the documented override unreachable from typed Go clients. Usejson:"ignoredNetworkDevices,omitzero"here;omitempty,omitzerowould still strip[]. The YAML fixture mentioned in the PR objectives would not catch this typed-client path.Proposed fix
- IgnoredNetworkDevices []NodeExporterIgnoredNetworkDevice `json:"ignoredNetworkDevices,omitempty"` + IgnoredNetworkDevices []NodeExporterIgnoredNetworkDevice `json:"ignoredNetworkDevices,omitzero"`Read-only verification
Expected: the field tag resolves to
json:"ignoredNetworkDevices,omitzero", and there is Go-side round-trip coverage in addition to the YAML fixture coverage.#!/bin/bash set -euo pipefail echo "Go version:" rg -n '^go ' go.mod echo echo "Current field tag:" rg -n 'IgnoredNetworkDevices.*json:"ignoredNetworkDevices' config/v1alpha1/types_cluster_monitoring.go echo echo "Nearby omitzero usage in this API:" rg -n 'omitzero' config/v1alpha1/types_cluster_monitoring.go | head -n 20 echo echo "Coverage related to ignoredNetworkDevices:" rg -n -C2 'ignoredNetworkDevices|IgnoredNetworkDevices' config/v1alpha1 echo echo "Go-side typed coverage hints:" rg -n -C2 'IgnoredNetworkDevices|ignoredNetworkDevices|Marshal|Unmarshal' --type go config/v1alpha1🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@config/v1alpha1/types_cluster_monitoring.go` around lines 292 - 306, The JSON tag on the IgnoredNetworkDevices field uses `omitempty` which drops non-nil empty slices and prevents preserving an explicit [] override; change the struct tag on IgnoredNetworkDevices to `json:"ignoredNetworkDevices,omitzero"` (remove `omitempty`) in types_cluster_monitoring.go so typed Go clients will round-trip an explicit empty slice, and add a small Go unit test that marshals and unmarshals a struct with IgnoredNetworkDevices set to an empty slice to verify the field is preserved through JSON round-trip.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@config/v1alpha1/types_cluster_monitoring.go`:
- Around line 292-306: The JSON tag on the IgnoredNetworkDevices field uses
`omitempty` which drops non-nil empty slices and prevents preserving an explicit
[] override; change the struct tag on IgnoredNetworkDevices to
`json:"ignoredNetworkDevices,omitzero"` (remove `omitempty`) in
types_cluster_monitoring.go so typed Go clients will round-trip an explicit
empty slice, and add a small Go unit test that marshals and unmarshals a struct
with IgnoredNetworkDevices set to an empty slice to verify the field is
preserved through JSON round-trip.
In
`@payload-manifests/crds/0000_10_config-operator_01_clustermonitorings.crd.yaml`:
- Around line 1535-1543: The CRD currently only enforces min/max length for
NodeExporterSystemdUnit strings (used by systemd.units[] and
ignoredNetworkDevices[]) so invalid regexes can be persisted and later fail;
update validation to reject malformed regex at admission by adding a
schema-level CEL (or OpenAPI x-kubernetes-validations/CEL) expression that
attempts to compile the regex (or uses a safe pattern check supported by your
target kube-apiserver) for both fields (systemd.units and ignoredNetworkDevices)
so only valid regex strings are admitted, or if CEL is unsupported for your
kube-apiserver, add admission/validation logic in the controller/admission
webhook that parses/compiles NodeExporterSystemdUnit values and returns a
rejection with a clear error when compilation fails.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 1a7543b4-c0b1-4731-9c5d-cd9ac2c226fe
⛔ Files ignored due to path filters (3)
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings.crd.yamlis excluded by!**/zz_generated.crd-manifests/*config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**openapi/generated_openapi/zz_generated.openapi.gois excluded by!openapi/**
📒 Files selected for processing (5)
config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yamlconfig/v1alpha1/types_cluster_monitoring.goconfig/v1alpha1/zz_generated.deepcopy.goconfig/v1alpha1/zz_generated.swagger_doc_generated.gopayload-manifests/crds/0000_10_config-operator_01_clustermonitorings.crd.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
- config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml
dc35dac to
4c675db
Compare
saschagrunert
left a comment
There was a problem hiding this comment.
Struct shape, markers, XValidation rules, and documentation are in good shape. Three items remain.
4c675db to
d06a174
Compare
|
I think this item has not been resolved yet: #2744 (comment) |
|
Scheduling tests matching the |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: saschagrunert The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/retest-required |
|
/verified by ci |
|
@danielmellado: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
| // +kubebuilder:validation:MinItems=0 | ||
| // +listType=set | ||
| // +optional | ||
| IgnoredNetworkDevices []NodeExporterIgnoredNetworkDevice `json:"ignoredNetworkDevices,omitempty,omitzero"` |
There was a problem hiding this comment.
In order to properly distinguish between omitted and intentionally setting this to an empty list value ([]), this will need to be a pointer. Otherwise, ignoredNetworkDevices: [] will never be something that can be serialized by a structured client and you will have to interpret [] as omitted. This means that omitted and [] signal to the system that it would need to default.
Converting to a pointer, you should be able to remove the minLength=0 tag but I'm OK with leaving that around if you'd like to be explicit with the markers.
There was a problem hiding this comment.
Good catch! switched IgnoredNetworkDevicesto *[]NodeExporterIgnoredNetworkDeviceso nil keeps the platform's default exclusion list and a non-nil empty slice means "don't exclude anything." Deepcopy updated accordingly, and there's a test for ignoredNetworkDevices: [] round-tripping. Left the MinItems=0 in place to keep the linter happy.
| // NodeExporterCollectorState defines whether a node-exporter collector is enabled or disabled. | ||
| // Valid values are "Enabled" and "Disabled". | ||
| // +kubebuilder:validation:Enum=Enabled;Disabled | ||
| // +enum | ||
| type NodeExporterCollectorState string | ||
|
|
||
| const ( | ||
| // NodeExporterCollectorEnabled means the collector is active and will produce metrics. | ||
| NodeExporterCollectorEnabled NodeExporterCollectorState = "Enabled" | ||
| // NodeExporterCollectorDisabled means the collector is inactive and will not produce metrics. | ||
| NodeExporterCollectorDisabled NodeExporterCollectorState = "Disabled" | ||
| ) |
There was a problem hiding this comment.
We generally try to avoid the usage of Enabled/Disabled terminology because it is often overloaded and not as clear to end users what each value means.
This applies to the NodeExporterNetlinkState enum as well. I'll leave some more thoughts as an alternative on the fields where these are used.
There was a problem hiding this comment.
Agreed. Renamed the collector toggle enum to Collect / DoNotCollect (type NodeExporterCollectorCollectionPolicy). Also replaced the netlink Enabled/Disabled with a separate enum per your suggestion in the netclass thread. No more Enabled/Disabled anywhere in the NodeExporter types.
| // At least one collector must be specified. | ||
| // +kubebuilder:validation:MinProperties=1 | ||
| type NodeExporterCollectorConfig struct { | ||
| // cpuFreq configures the cpufreq collector, which collects CPU frequency statistics. |
There was a problem hiding this comment.
Applicable to all of these collectors, but as someone not that familiar with node exporter collectors is there any guidance we could include in the field descriptions as to when I might want to explicitly configure these collectors?
The what is clear. The why is not as clear to me.
There was a problem hiding this comment.
Added a "when to use" line to every collector field comment, e.g. cpuFreq mentions CPU-frequency-scaling observability and the many-core CPU overhead trade-off, mountStats mentions NFS client stats and cardinality risk, buddyInfo mentions kernel memory fragmentation troubleshooting, etc. Tried to keep them short and practical without duplicating the node_exporter docs.
| // enabled enables or disables the cpufreq collector. | ||
| // This field is required. | ||
| // Valid values are "Enabled" and "Disabled". | ||
| // When set to "Enabled", the cpufreq collector is active and CPU frequency statistics are collected. | ||
| // When set to "Disabled", the cpufreq collector is inactive. | ||
| // +required | ||
| Enabled NodeExporterCollectorState `json:"enabled,omitempty"` |
There was a problem hiding this comment.
Going back to my enabled/disabled comment for this, I think something like:
cpuFreq:
collectionPolicy: Collect | DoNotCollectwould be more descriptive than enabled: Enabled | Disabled. What do you think?
There was a problem hiding this comment.
Done! every collector struct now uses collectionPolicy as the field name with the Collect/DoNotCollect enum. CRDs, tests, and XValidation rules all updated to match.
| // useNetlink activates the netlink implementation of the netclass collector. | ||
| // useNetlink is optional. | ||
| // This implementation improves the performance of the netclass collector. | ||
| // Valid values are "Enabled" and "Disabled". | ||
| // When set to "Enabled", the netlink implementation is used for improved performance. | ||
| // When set to "Disabled", the default sysfs implementation is used. | ||
| // When the netclass collector is disabled, this field must not be set. | ||
| // When omitted, this means no opinion and the platform is left to choose a reasonable default, | ||
| // which is subject to change over time. The current default is "Enabled". | ||
| // +optional | ||
| UseNetlink NodeExporterNetlinkState `json:"useNetlink,omitempty"` |
There was a problem hiding this comment.
Going back to my enabled/disabled comment, this seems like there is a choice between using netlink vs sysfs. It would be more descriptive for a user if they could specify which implementation to use directly and would future proof this if a new implementation is added down the road (i.e you could add a new enum value).
Maybe something like:
netClass:
...
statsGatherer: SysFS | NetLink?
I'm not married to statsGatherer, but reading https://github.com/prometheus/node_exporter/blob/0f5c15834bcfe027be499abe34673e869d4d5386/collector/netclass_linux.go#L35 it seemed like a reasonable name.
There was a problem hiding this comment.
Went with statsGatherer exactly as you suggested. Values are Sysfs and Netlink. XValidation rule ensures statsGatherer can only be set when collectionPolicy is Collect. Future implementations can just add a new enum value.
| // with many cores. If you enable this collector and have machines with many cores, monitor your | ||
| // systems closely for excessive CPU usage. | ||
| // +optional | ||
| CpuFreq NodeExporterCollectorCpufreqConfig `json:"cpuFreq,omitempty,omitzero"` |
There was a problem hiding this comment.
Just a note from looking for more context on these collectors, it looks like the lowercase naming is pretty common and would be something prometheus users are likely the most familiar with.
I'll leave it up to you, but I'd be OK with making the serialized names not necessarily follow the camelCase format if it would be more natural for end users despite that being our preferred format.
There was a problem hiding this comment.
Kept camelCase (cpuFreq, netClass, tcpStat, etc.) to stay consistent with the rest of the ClusterMonitoring API surface. Happy to revisit if there's a broader push to align with node_exporter naming, but didn't want to introduce mixed conventions in one CRD.
| // enabled enables or disables the systemd collector. | ||
| // This field is required. | ||
| // Valid values are "Enabled" and "Disabled". | ||
| // When set to "Enabled", the systemd collector is active and systemd unit statistics are collected. | ||
| // When set to "Disabled", the systemd collector is inactive. | ||
| // +required | ||
| Enabled NodeExporterCollectorState `json:"enabled,omitempty"` | ||
| // units is a list of regular expression patterns that match systemd units to be included | ||
| // by the systemd collector. | ||
| // units is optional. | ||
| // By default, the list is empty, so the collector exposes no metrics for systemd units. | ||
| // When the systemd collector is disabled, this field must not be set. | ||
| // Each entry is a regular expression pattern and must be at least 1 character and at most 1024 characters. | ||
| // Maximum length for this list is 50. | ||
| // Minimum length for this list is 1. | ||
| // Entries in this list must be unique. | ||
| // +kubebuilder:validation:MaxItems=50 | ||
| // +kubebuilder:validation:MinItems=1 | ||
| // +listType=set | ||
| // +optional | ||
| Units []NodeExporterSystemdUnit `json:"units,omitempty"` |
There was a problem hiding this comment.
For the types that have configuration options, maybe it makes sense to have these be a discriminated union with the discriminant being optional?
Something like:
systemd:
collectionPolicy: Collect
collect: # optional, but if specified must set one subfield. forbidden when collectionPolicy is not 'Collect'
units: ...?
There was a problem hiding this comment.
For this PR I kept the flat layout. i.e. collectionPolicy + units with a CEL rule that enforces units can only be set when collectionPolicy is Collect. Same runtime behavior, less structural churn for CMO to consume right now. If you feel strongly about the union shape I can do it as a follow-up. Open to either way! Thanks!
Add configuration for the node-exporter agent that runs as a DaemonSet in openshift-monitoring, collecting hardware and OS-level metrics from every node in the cluster. Signed-off-by: Daniel Mellado <[email protected]>
d06a174 to
b9d61de
Compare
|
New changes are detected. LGTM label has been removed. |
|
@danielmellado: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Add configuration for the node-exporter agent that runs as a
DaemonSet in openshift-monitoring, collecting hardware and
OS-level metrics from every node in the cluster.
Signed-off-by: Daniel Mellado [email protected]