Skip to content

[CUDA EP Plugin] ResourceAcountant integration#28028

Merged
yuslepukhin merged 15 commits intomainfrom
yuslepukhin/plugin_resource
Apr 14, 2026
Merged

[CUDA EP Plugin] ResourceAcountant integration#28028
yuslepukhin merged 15 commits intomainfrom
yuslepukhin/plugin_resource

Conversation

@yuslepukhin
Copy link
Copy Markdown
Member

@yuslepukhin yuslepukhin commented Apr 9, 2026

This pull request introduces several enhancements and refactorings to the resource accounting and execution provider (EP) infrastructure, with a focus on better support for plugin-based CUDA execution providers. The most significant changes include the addition of type-erased arithmetic for resource accounting, improved handling of resource budgets for plugin EPs, and more robust device matching logic. These updates increase maintainability, enforce stricter type safety, and ensure correct resource tracking across both in-tree and plugin-based EPs.

Resource accounting improvements:

  • Added type-erased arithmetic functions (AddResourceCounts, ResourceCountExceeds, FormatResourceCount) for ResourceCount to enforce exhaustive handling of variant types and improve type safety. [1] [2]
  • Refactored the IResourceAccountant interface: replaced ResetPendingWeights with ResetForNewPass, which resets both the stop flag and pending weights, and introduced a protected ResetPendingWeightsImpl for subclass-specific cleanup. [1] [2] [3] [4] [5]

Plugin CUDA EP and resource budget enforcement:

  • Added kCudaPluginExecutionProvider constant and updated logic to ensure plugin EPs correctly map to their in-tree accountant counterparts and are included in device matching and partitioning. [1] [2] [3] [4] [5] [6]
  • Updated plugin EP infrastructure to pass and utilize resource accountant pointers, enabling host-side resource budget enforcement for plugin EPs and ensuring correct node assignment. [1] [2] [3] [4] [5] [6] [7]

Device matching and partitioning:

  • Improved device matching heuristics to consider both in-tree and plugin CUDA EPs, and updated logic to prefer runtime device ordinals for more reliable device selection.

Other minor changes include code style cleanups and additional includes for completeness.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an ABI-stable resource accounting surface for plugin Execution Providers (EPs) to enable capacity-aware (budget-constrained) node assignment during partitioning, and integrates the new API into the CUDA plugin EP and test coverage.

Changes:

  • Introduces OrtResourceCount (tagged union) and new OrtEpApi resource-budget query/reporting functions for plugin EP partitioning.
  • Wires an optional IResourceAccountant* through OrtEpGraphSupportInfo, and plumbs reported per-node costs back into IndexedSubGraph accounting.
  • Updates the CUDA plugin EP to compute per-node costs, stop accepting nodes when budget is exceeded, and adds new unit/integration tests for both ad-hoc and stats-based accounting paths.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
onnxruntime/test/providers/cuda/plugin/cuda_resource_partitioning_test.cc Adds CUDA plugin EP integration tests for budgeted partitioning and OrtResourceCount behavior.
onnxruntime/test/framework/resource_accountant_test.cc Updates tests to use real accountants (ad-hoc + stats-file paths) and adds factory/config parsing coverage.
onnxruntime/core/session/plugin_ep/ep_plugin_provider_interfaces.cc Passes IResourceAccountant* into plugin support info and attaches reported per-node costs to IndexedSubGraph.
onnxruntime/core/session/plugin_ep/ep_api.h Declares new plugin EP API entrypoints for resource budgets and stop signaling.
onnxruntime/core/session/plugin_ep/ep_api.cc Implements the new resource accounting entrypoints and appends them to OrtEpApi v26.
onnxruntime/core/session/abi_ep_types.h Extends OrtEpGraphSupportInfo with optional accountant pointer and accepted node cost tracking.
onnxruntime/core/providers/cuda/plugin/cuda_ep.cc Uses the new ResourceBudget wrapper to enforce budget during GetCapability node selection and signal stop.
include/onnxruntime/core/session/onnxruntime_ep_c_api.h Adds OrtResourceCount and documents new OrtEpApi resource accounting functions.
include/onnxruntime/core/session/onnxruntime_cxx_inline.h Implements Ort::ResourceBudget C++ wrapper methods.
include/onnxruntime/core/session/onnxruntime_cxx_api.h Declares the Ort::ResourceBudget wrapper for plugin EP authors.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/core/providers/cuda/plugin/cuda_ep.cc Outdated
Comment thread onnxruntime/core/providers/cuda/plugin/cuda_ep.cc Outdated
Comment thread onnxruntime/core/session/plugin_ep/ep_api.cc Outdated
Comment thread onnxruntime/core/session/plugin_ep/ep_plugin_provider_interfaces.cc Outdated
Comment thread onnxruntime/test/providers/cuda/plugin/cuda_resource_partitioning_test.cc Outdated
Comment thread onnxruntime/test/providers/cuda/plugin/cuda_resource_partitioning_test.cc Outdated
Comment thread onnxruntime/test/providers/cuda/plugin/cuda_resource_partitioning_test.cc Outdated
Comment thread onnxruntime/test/framework/resource_accountant_test.cc Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/core/session/plugin_ep/ep_api.cc Outdated
Comment thread onnxruntime/core/session/plugin_ep/ep_api.cc Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/test/providers/cuda/plugin/cuda_resource_partitioning_test.cc Outdated
Comment thread include/onnxruntime/core/session/onnxruntime_cxx_api.h Outdated
Comment thread onnxruntime/core/session/plugin_ep/ep_api.cc Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/core/session/plugin_ep/ep_plugin_provider_interfaces.cc Outdated
Comment thread onnxruntime/test/providers/cuda/plugin/cuda_resource_partitioning_test.cc Outdated
Comment thread onnxruntime/test/providers/cuda/plugin/cuda_resource_partitioning_test.cc Outdated
Comment thread onnxruntime/test/framework/resource_accountant_test.cc
Copy link
Copy Markdown
Contributor

@tianleiwu tianleiwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the resource accounting integration for the plugin EP ABI. The API design for passing costs through OrtEpGraphSupportInfo is clean, and the added tests cover the ad-hoc/stats accountant paths plus CUDA-plugin end-to-end behavior nicely.

I found two issues in the host-side plumbing that converts plugin-reported costs back into ComputeCapability objects:

1. Fused plugin capabilities bypass resource accounting entirely (High Priority)

The cost-attachment code only runs in the kSingleAssignedNode branch in ep_plugin_provider_interfaces.cc. The kFusedNode branch calls CreateSupportedPartitions() and forwards the result without attaching IResourceAccountant or cost metadata to the IndexedSubGraph.

This means any plugin EP that uses EpGraphSupportInfo_AddNodesToFuse() while resource accounting is enabled will produce ComputeCapability objects with no budget tracking. GraphPartitioner::AccountForAllNodes() will be silently bypassed for those fused partitions, making budget enforcement incomplete.

Since existing plugin EPs already use AddNodesToFuse(), this is a real gap in the new API surface.

2. OrtResourceCountKind_None sentinel is unusable (Suggestion)

EpGraphSupportInfo_ReportAcceptedNodeCost() in ep_api.cc explicitly accepts OrtResourceCountKind_None, and the public header documents it as the zero-cost sentinel. However, the host-side conversion in ep_plugin_provider_interfaces.cc (the switch on ort_cost.kind) falls through OrtResourceCountKind_None into the default branch and logs "Unknown OrtResourceCountKind", skipping cost attachment entirely. This makes the documented sentinel produce a warning and silently disables accounting for that node.

Consider handling OrtResourceCountKind_None explicitly by appending a zero cost and attaching the accountant.

# Severity Component Issue
1 High Plugin cost propagation Reported costs only attached for AddSingleNode() capabilities; fused plugin capabilities bypass accounting entirely
2 Suggestion OrtResourceCount semantics OrtResourceCountKind_None documented/validated as supported, but host-side conversion treats it as unknown

Comment thread onnxruntime/core/session/plugin_ep/ep_plugin_provider_interfaces.cc
Comment thread include/onnxruntime/core/session/onnxruntime_ep_c_api.h Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/core/session/plugin_ep/ep_plugin_provider_interfaces.cc Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/test/framework/resource_accountant_test.cc Outdated
Comment thread onnxruntime/core/session/plugin_ep/ep_plugin_provider_interfaces.cc Outdated
Comment thread onnxruntime/test/framework/resource_accountant_test.cc
@yuslepukhin yuslepukhin marked this pull request as ready for review April 11, 2026 00:22
Copy link
Copy Markdown
Contributor

@tianleiwu tianleiwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consolidated Review — commit 5f33bf7

The host-side budget enforcement integration is well-structured: computing costs and enforcing the budget uniformly in PluginExecutionProvider::GetCapability() keeps plugin EPs focused on proposing supported nodes. The fused-capability gap from the previous review round is now addressed, and the ResetForNewPass() refactoring (Template Method pattern) is a clean improvement.

Two cross-cutting concerns remain:

⚠️ High-Priority: Accountant key mismatch renders plugin budget enforcement inactive

CreateAccountants() in resource_accountant.cc inserts the CUDA accountant under kCudaExecutionProvider ("CUDAExecutionProvider"), but PartitionOnnxFormatModel() in graph_partitioner.cc looks up the accountant via ep->Type(), which returns kCudaPluginExecutionProvider ("CudaPluginExecutionProvider") for the plugin EP. The lookup misses, so PluginExecutionProvider::GetCapability() receives resource_accountant == nullptr — silently disabling all the host-side budget enforcement added in this PR.

Suggested fix — either register a second accountant entry under kCudaPluginExecutionProvider, or translate the lookup key in the partitioner:

const auto accountant_key = ep->Type() == kCudaPluginExecutionProvider
                                ? kCudaExecutionProvider
                                : ep->Type();
auto hit = acc_map->find(accountant_key);

The integration test TinyBudget_NodesOffloadedToCpu likely still passes without this fix because some nodes fall to CPU due to unsupported ops, not actual budget enforcement.

Note: consumed_bytes fragility in fused-group path

When resource_accountant != nullptr but has_budget == false, the fused-group section computes and appends per-node costs but never updates the local consumed_bytes. This is safe today (budget checks are skipped when has_budget == false) but fragile: any future code that reads consumed_bytes unconditionally will see a stale value. Consider always updating consumed_bytes when costs are computed, regardless of the budget flag.

Previously resolved threads

  • CreateAdHocAccountant ORT_THROW_IF_ERROR — now uses ASSERT_STATUS_OK/ASSERT_TRUE/ASSERT_NE at current HEAD. Resolved.
  • Per-node log level at INFO — now uses VERBOSE at current HEAD. Resolved.

Comment thread onnxruntime/core/session/plugin_ep/ep_plugin_provider_interfaces.cc Outdated
Comment thread onnxruntime/core/session/plugin_ep/ep_plugin_provider_interfaces.cc Outdated
Comment thread onnxruntime/test/providers/cuda/plugin/cuda_resource_partitioning_test.cc Outdated
@yuslepukhin yuslepukhin requested a review from Copilot April 13, 2026 20:37
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/test/providers/cuda/plugin/cuda_resource_partitioning_test.cc Outdated
Comment thread include/onnxruntime/core/framework/resource_accountant.h
Comment thread onnxruntime/test/framework/resource_accountant_test.cc Outdated
@yuslepukhin yuslepukhin enabled auto-merge (squash) April 13, 2026 22:53
@yuslepukhin yuslepukhin merged commit f7113bd into main Apr 14, 2026
101 of 102 checks passed
@yuslepukhin yuslepukhin deleted the yuslepukhin/plugin_resource branch April 14, 2026 03:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants