CA: optimize simulator performance by embedding scheduler structures#9388
CA: optimize simulator performance by embedding scheduler structures#9388k8s-ci-robot merged 3 commits intokubernetes:masterfrom
Conversation
Update drasnapshot.findPodClaims to return nil instead of an empty slice when no claims are present. This ensures consistency with framework.NewPodInfo and prevents test failures due to mismatch between nil and empty slices in deep comparisons, without needing to relax strictness in tests.
dc2b30c to
4adbf46
Compare
|
I'm posting the results from the benchmark job https://github.com/kubernetes/autoscaler/actions/runs/23296851893?pr=9388 |
|
/assign @mtrqq @BigDarkClown |
That would also be ideal to compare benchmarks from real-hardware scalability tests, did you manage to obtain those? This is not blocking comment as this PR is straight an improvement, just want to understand the improvements made in real clusters |
mtrqq
left a comment
There was a problem hiding this comment.
First of all, thanks for the change, left 1 major comment about data consistency and two nits so far.
One thing I want to callout about this change is that we shift DRA and CSI computations from on-demand to on-startup when creating a snapshot. I haven't been tracing autoscaler code paths specifically to proof this, but it potentially may lead to having more DRA & CSI computations than it would have needed when lazy loading those objects (not confirmed scenario to show the problem: binpacking doesn't need to consider all the nodes from the snapshot to make a scale up decision)
I think this change is a step into the correct direction, but we probably should also explore the topic of reconciling cluster snapshot with each loop instead of completely throwing it away and reconstructing on each iteration, this may yield large performance improvements especially in huge clusters.
Just to clarify - this discussion isn't something blocking this PR by any means
@x13n @BigDarkClown WDYT?
cluster-autoscaler/simulator/clustersnapshot/predicate/predicate_snapshot.go
Show resolved
Hide resolved
I agree constant snapshot recreation is costly, but it is orthogonal and from what I've seen the problem addressed by this PR is much more visible in large clusters. NodeInfo wrapping was causing major CPU burn, so happy to see this addressed. We can check if there are other problems worth tackling once this one is out of the way. Let's launch and iterate. |
|
Is #9022 obsoleted by this one? |
Thanks to @tetianakh we managed to obtain a profile diff:
|
Yes, #9022 is superseded by this PR and will be closed. I've narrowed the scope here to essential performance fixes to streamline the review while @towca is away. Now that the Cluster Autoscaler Benchmark job is available, I'll follow up with the remaining minor improvements as separate, data-justified PRs. |
4adbf46 to
e87265e
Compare
| // Manual initialization may result in errors. | ||
| type PodInfo struct { | ||
| // This type embeds *apiv1.Pod to make the accesses easier - most of the code just needs to access the Pod. | ||
| *apiv1.Pod |
There was a problem hiding this comment.
Do we still need to store pod if we already have PodInfo?
There was a problem hiding this comment.
Technically, we don't, but I decided to keep it to save this PR from all the boilerplate required to remove the embedded pod:
PodInfo.<embedded field> -> PodInfo.Pod().<embedded field>
I can follow up with that change once this is merged.
| ni := framework.NewNodeInfo(node, slices) | ||
|
|
||
| if s.enableCSINodeAwareScheduling && csiSnapshot != nil { | ||
| if csiNode, err := csiSnapshot.Get(node.Name); err == nil { |
There was a problem hiding this comment.
Should we return error if it is returned here?
There was a problem hiding this comment.
csiSnapshot.Get returns an error if snapshot of the given name is not found, so if we push down that error there, SetClusterState will fail every time CSINodeAwareScheduling is enbaled and a node does not have csiSnapshot. Is that expected?
There was a problem hiding this comment.
I decided to push the error down to keep the logic as before the change. I'll open issue to discuss that uncertainty.
Redesign simulator framework.NodeInfo and framework.PodInfo to embed their upstream scheduler implementation counterparts. This eliminates the need for frequent WrapSchedulerNodeInfo calls, significantly reducing memory allocations and GC pressure during simulations. This change shifts the ClusterSnapshot from a just-in-time wrapping model to a persistent, pre-populated approach. Internal objects now eagerly integrate DRA and CSI data, ensuring that all necessary state is readily available without redundant lookups or repeated object creation. Main implementation paths: - Embed schedulerimpl.NodeInfo and schedulerimpl.PodInfo into CA's internal framework types to natively satisfy scheduler interfaces. - Update ClusterSnapshotStore to manage internal NodeInfo/PodInfo directly, removing the need for ToScheduler() conversions. - Enforce internal type consistency in NodeInfo operations (e.g., AddPod) to preserve critical DRA/CSI metadata throughout the simulation. - Optimize snapshotting by implementing efficient Snapshot() methods on the refined NodeInfo structures.
e87265e to
a4d5e3c
Compare
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: BigDarkClown, Choraden The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind cleanup
/kind regression
What this PR does / why we need it:
This PR redesigns simulator
framework.NodeInfoandframework.PodInfoto embed theirupstream scheduler implementation counterparts. This eliminates the need for frequent
WrapSchedulerNodeInfocalls, significantly reducing memory allocations and GC pressure during simulations.This is to fix the performance regression introduced in initial DRA implementation.
Which issue(s) this PR fixes:
Fixes #8965
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: