CA: optimize simulator performance by embedding scheduler structures by Choraden · Pull Request #9388 · kubernetes/autoscaler

Choraden · 2026-03-19T13:13:21Z

What type of PR is this?

/kind cleanup
/kind regression

What this PR does / why we need it:

This PR redesigns simulator framework.NodeInfo and framework.PodInfo to embed their
upstream scheduler implementation counterparts. This eliminates the need for frequent WrapSchedulerNodeInfo calls, significantly reducing memory allocations and GC pressure during simulations.

This is to fix the performance regression introduced in initial DRA implementation.

Which issue(s) this PR fixes:

Fixes #8965

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Update drasnapshot.findPodClaims to return nil instead of an empty slice when no claims are present. This ensures consistency with framework.NewPodInfo and prevents test failures due to mismatch between nil and empty slices in deep comparisons, without needing to relax strictness in tests.

Choraden · 2026-03-19T14:07:15Z

I'm posting the results from the benchmark job https://github.com/kubernetes/autoscaler/actions/runs/23296851893?pr=9388

goos: linux
goarch: amd64
pkg: k8s.io/autoscaler/cluster-autoscaler/core/bench
cpu: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
                   │   base.txt   │               pr.txt                │
                   │    sec/op    │   sec/op     vs base                │
RunOnceScaleUp-4      734.7m ± 8%   663.0m ± 9%   -9.77% (p=0.002 n=10)
RunOnceScaleDown-4   1125.0m ± 2%   813.7m ± 1%  -27.68% (p=0.000 n=10)
geomean               909.2m        734.5m       -19.22%

                   │   base.txt   │                pr.txt                │
                   │     B/op     │     B/op      vs base                │
RunOnceScaleUp-4     534.3Mi ± 0%   425.6Mi ± 0%  -20.35% (p=0.000 n=10)
RunOnceScaleDown-4   999.6Mi ± 1%   537.6Mi ± 5%  -46.21% (p=0.000 n=10)
geomean              730.8Mi        478.4Mi       -34.54%

                   │   base.txt   │               pr.txt                │
                   │  allocs/op   │  allocs/op   vs base                │
RunOnceScaleUp-4      5.891M ± 0%   3.863M ± 0%  -34.42% (p=0.000 n=10)
RunOnceScaleDown-4   10.268M ± 0%   2.251M ± 6%  -78.08% (p=0.000 n=10)
geomean               7.778M        2.949M       -62.09%

Choraden · 2026-03-19T14:07:49Z

/assign @mtrqq @BigDarkClown

mtrqq · 2026-03-19T15:03:14Z

I'm posting the results from the benchmark job https://github.com/kubernetes/autoscaler/actions/runs/23296851893?pr=9388

goos: linux
goarch: amd64
pkg: k8s.io/autoscaler/cluster-autoscaler/core/bench
cpu: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
                   │   base.txt   │               pr.txt                │
                   │    sec/op    │   sec/op     vs base                │
RunOnceScaleUp-4      734.7m ± 8%   663.0m ± 9%   -9.77% (p=0.002 n=10)
RunOnceScaleDown-4   1125.0m ± 2%   813.7m ± 1%  -27.68% (p=0.000 n=10)
geomean               909.2m        734.5m       -19.22%

                   │   base.txt   │                pr.txt                │
                   │     B/op     │     B/op      vs base                │
RunOnceScaleUp-4     534.3Mi ± 0%   425.6Mi ± 0%  -20.35% (p=0.000 n=10)
RunOnceScaleDown-4   999.6Mi ± 1%   537.6Mi ± 5%  -46.21% (p=0.000 n=10)
geomean              730.8Mi        478.4Mi       -34.54%

                   │   base.txt   │               pr.txt                │
                   │  allocs/op   │  allocs/op   vs base                │
RunOnceScaleUp-4      5.891M ± 0%   3.863M ± 0%  -34.42% (p=0.000 n=10)
RunOnceScaleDown-4   10.268M ± 0%   2.251M ± 6%  -78.08% (p=0.000 n=10)
geomean               7.778M        2.949M       -62.09%

That would also be ideal to compare benchmarks from real-hardware scalability tests, did you manage to obtain those? This is not blocking comment as this PR is straight an improvement, just want to understand the improvements made in real clusters

mtrqq

First of all, thanks for the change, left 1 major comment about data consistency and two nits so far.

One thing I want to callout about this change is that we shift DRA and CSI computations from on-demand to on-startup when creating a snapshot. I haven't been tracing autoscaler code paths specifically to proof this, but it potentially may lead to having more DRA & CSI computations than it would have needed when lazy loading those objects (not confirmed scenario to show the problem: binpacking doesn't need to consider all the nodes from the snapshot to make a scale up decision)

I think this change is a step into the correct direction, but we probably should also explore the topic of reconciling cluster snapshot with each loop instead of completely throwing it away and reconstructing on each iteration, this may yield large performance improvements especially in huge clusters.

Just to clarify - this discussion isn't something blocking this PR by any means

@x13n @BigDarkClown WDYT?

cluster-autoscaler/simulator/framework/infos.go

cluster-autoscaler/simulator/clustersnapshot/predicate/predicate_snapshot.go

x13n · 2026-03-20T13:36:08Z

First of all, thanks for the change, left 1 major comment about data consistency and two nits so far.

One thing I want to callout about this change is that we shift DRA and CSI computations from on-demand to on-startup when creating a snapshot. I haven't been tracing autoscaler code paths specifically to proof this, but it potentially may lead to having more DRA & CSI computations than it would have needed when lazy loading those objects (not confirmed scenario to show the problem: binpacking doesn't need to consider all the nodes from the snapshot to make a scale up decision)

I think this change is a step into the correct direction, but we probably should also explore the topic of reconciling cluster snapshot with each loop instead of completely throwing it away and reconstructing on each iteration, this may yield large performance improvements especially in huge clusters.

Just to clarify - this discussion isn't something blocking this PR by any means

@x13n @BigDarkClown WDYT?

I agree constant snapshot recreation is costly, but it is orthogonal and from what I've seen the problem addressed by this PR is much more visible in large clusters. NodeInfo wrapping was causing major CPU burn, so happy to see this addressed. We can check if there are other problems worth tackling once this one is out of the way. Let's launch and iterate.

x13n · 2026-03-20T15:36:07Z

Is #9022 obsoleted by this one?

Choraden · 2026-03-23T09:53:41Z

I'm posting the results from the benchmark job https://github.com/kubernetes/autoscaler/actions/runs/23296851893?pr=9388

goos: linux
goarch: amd64
pkg: k8s.io/autoscaler/cluster-autoscaler/core/bench
cpu: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
                   │   base.txt   │               pr.txt                │
                   │    sec/op    │   sec/op     vs base                │
RunOnceScaleUp-4      734.7m ± 8%   663.0m ± 9%   -9.77% (p=0.002 n=10)
RunOnceScaleDown-4   1125.0m ± 2%   813.7m ± 1%  -27.68% (p=0.000 n=10)
geomean               909.2m        734.5m       -19.22%

                   │   base.txt   │                pr.txt                │
                   │     B/op     │     B/op      vs base                │
RunOnceScaleUp-4     534.3Mi ± 0%   425.6Mi ± 0%  -20.35% (p=0.000 n=10)
RunOnceScaleDown-4   999.6Mi ± 1%   537.6Mi ± 5%  -46.21% (p=0.000 n=10)
geomean              730.8Mi        478.4Mi       -34.54%

                   │   base.txt   │               pr.txt                │
                   │  allocs/op   │  allocs/op   vs base                │
RunOnceScaleUp-4      5.891M ± 0%   3.863M ± 0%  -34.42% (p=0.000 n=10)
RunOnceScaleDown-4   10.268M ± 0%   2.251M ± 6%  -78.08% (p=0.000 n=10)
geomean               7.778M        2.949M       -62.09%

That would also be ideal to compare benchmarks from real-hardware scalability tests, did you manage to obtain those? This is not blocking comment as this PR is straight an improvement, just want to understand the improvements made in real clusters

Thanks to @tetianakh we managed to obtain a profile diff:

cluster spec:

~4k nodes
~17k pods
idle state i.e. no operation running in the cluster

Choraden · 2026-03-23T09:58:25Z

Is #9022 obsoleted by this one?

Yes, #9022 is superseded by this PR and will be closed. I've narrowed the scope here to essential performance fixes to streamline the review while @towca is away. Now that the Cluster Autoscaler Benchmark job is available, I'll follow up with the remaining minor improvements as separate, data-justified PRs.

cluster-autoscaler/simulator/framework/infos.go

BigDarkClown · 2026-03-23T13:23:17Z

cluster-autoscaler/simulator/framework/infos.go

+// Manual initialization may result in errors.
 type PodInfo struct {
 	// This type embeds *apiv1.Pod to make the accesses easier - most of the code just needs to access the Pod.
 	*apiv1.Pod


Do we still need to store pod if we already have PodInfo?

Technically, we don't, but I decided to keep it to save this PR from all the boilerplate required to remove the embedded pod:
PodInfo.<embedded field> -> PodInfo.Pod().<embedded field>

I can follow up with that change once this is merged.

Sounds good to me.

BigDarkClown · 2026-03-23T14:01:18Z

cluster-autoscaler/simulator/clustersnapshot/predicate/predicate_snapshot.go

+		ni := framework.NewNodeInfo(node, slices)
+
+		if s.enableCSINodeAwareScheduling && csiSnapshot != nil {
+			if csiNode, err := csiSnapshot.Get(node.Name); err == nil {


Should we return error if it is returned here?

csiSnapshot.Get returns an error if snapshot of the given name is not found, so if we push down that error there, SetClusterState will fail every time CSINodeAwareScheduling is enbaled and a node does not have csiSnapshot. Is that expected?

I decided to push the error down to keep the logic as before the change. I'll open issue to discuss that uncertainty.

Redesign simulator framework.NodeInfo and framework.PodInfo to embed their upstream scheduler implementation counterparts. This eliminates the need for frequent WrapSchedulerNodeInfo calls, significantly reducing memory allocations and GC pressure during simulations. This change shifts the ClusterSnapshot from a just-in-time wrapping model to a persistent, pre-populated approach. Internal objects now eagerly integrate DRA and CSI data, ensuring that all necessary state is readily available without redundant lookups or repeated object creation. Main implementation paths: - Embed schedulerimpl.NodeInfo and schedulerimpl.PodInfo into CA's internal framework types to natively satisfy scheduler interfaces. - Update ClusterSnapshotStore to manage internal NodeInfo/PodInfo directly, removing the need for ToScheduler() conversions. - Enforce internal type consistency in NodeInfo operations (e.g., AddPod) to preserve critical DRA/CSI metadata throughout the simulation. - Optimize snapshotting by implementing efficient Snapshot() methods on the refined NodeInfo structures.

BigDarkClown · 2026-03-23T16:58:17Z

/lgtm
/approve

k8s-ci-robot · 2026-03-23T16:58:34Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: BigDarkClown, Choraden

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~cluster-autoscaler/OWNERS~~ [BigDarkClown]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot requested review from BigDarkClown and arunmk March 19, 2026 13:13

Choraden force-pushed the nodeinfo_wrapper_v4 branch from dc2b30c to 4adbf46 Compare March 19, 2026 13:19

k8s-ci-robot assigned BigDarkClown and mtrqq Mar 19, 2026

mtrqq suggested changes Mar 19, 2026

View reviewed changes

cluster-autoscaler/simulator/framework/infos.go Show resolved Hide resolved

cluster-autoscaler/simulator/framework/infos.go Outdated Show resolved Hide resolved

cluster-autoscaler/simulator/clustersnapshot/predicate/predicate_snapshot.go Show resolved Hide resolved

Choraden force-pushed the nodeinfo_wrapper_v4 branch from 4adbf46 to e87265e Compare March 23, 2026 10:34

BigDarkClown reviewed Mar 23, 2026

View reviewed changes

Choraden added 2 commits March 23, 2026 14:59

CA: update NewNodeInfo and NewPodInfo callers (boilerplate)

a4d5e3c

Choraden force-pushed the nodeinfo_wrapper_v4 branch from e87265e to a4d5e3c Compare March 23, 2026 16:00

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 23, 2026

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 23, 2026

k8s-ci-robot merged commit eb987bf into kubernetes:master Mar 23, 2026
10 of 11 checks passed

Choraden mentioned this pull request Mar 23, 2026

Refactor ClusterSnapshot to use internal NodeInfo and optimize access #9022

Closed

Choraden deleted the nodeinfo_wrapper_v4 branch March 23, 2026 18:22

Conversation

Choraden commented Mar 19, 2026

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

Choraden commented Mar 19, 2026

Uh oh!

Choraden commented Mar 19, 2026

Uh oh!

mtrqq commented Mar 19, 2026

Uh oh!

mtrqq left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

x13n commented Mar 20, 2026

Uh oh!

x13n commented Mar 20, 2026

Uh oh!

Choraden commented Mar 23, 2026

Uh oh!

Choraden commented Mar 23, 2026

Uh oh!

Uh oh!

BigDarkClown Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Choraden Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

BigDarkClown Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

BigDarkClown Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Choraden Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Choraden Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

BigDarkClown commented Mar 23, 2026

Uh oh!

k8s-ci-robot commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants