[CUDA EP Plugin] ResourceAcountant integration#28028
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds an ABI-stable resource accounting surface for plugin Execution Providers (EPs) to enable capacity-aware (budget-constrained) node assignment during partitioning, and integrates the new API into the CUDA plugin EP and test coverage.
Changes:
- Introduces
OrtResourceCount(tagged union) and newOrtEpApiresource-budget query/reporting functions for plugin EP partitioning. - Wires an optional
IResourceAccountant*throughOrtEpGraphSupportInfo, and plumbs reported per-node costs back intoIndexedSubGraphaccounting. - Updates the CUDA plugin EP to compute per-node costs, stop accepting nodes when budget is exceeded, and adds new unit/integration tests for both ad-hoc and stats-based accounting paths.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| onnxruntime/test/providers/cuda/plugin/cuda_resource_partitioning_test.cc | Adds CUDA plugin EP integration tests for budgeted partitioning and OrtResourceCount behavior. |
| onnxruntime/test/framework/resource_accountant_test.cc | Updates tests to use real accountants (ad-hoc + stats-file paths) and adds factory/config parsing coverage. |
| onnxruntime/core/session/plugin_ep/ep_plugin_provider_interfaces.cc | Passes IResourceAccountant* into plugin support info and attaches reported per-node costs to IndexedSubGraph. |
| onnxruntime/core/session/plugin_ep/ep_api.h | Declares new plugin EP API entrypoints for resource budgets and stop signaling. |
| onnxruntime/core/session/plugin_ep/ep_api.cc | Implements the new resource accounting entrypoints and appends them to OrtEpApi v26. |
| onnxruntime/core/session/abi_ep_types.h | Extends OrtEpGraphSupportInfo with optional accountant pointer and accepted node cost tracking. |
| onnxruntime/core/providers/cuda/plugin/cuda_ep.cc | Uses the new ResourceBudget wrapper to enforce budget during GetCapability node selection and signal stop. |
| include/onnxruntime/core/session/onnxruntime_ep_c_api.h | Adds OrtResourceCount and documents new OrtEpApi resource accounting functions. |
| include/onnxruntime/core/session/onnxruntime_cxx_inline.h | Implements Ort::ResourceBudget C++ wrapper methods. |
| include/onnxruntime/core/session/onnxruntime_cxx_api.h | Declares the Ort::ResourceBudget wrapper for plugin EP authors. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tianleiwu
left a comment
There was a problem hiding this comment.
I reviewed the resource accounting integration for the plugin EP ABI. The API design for passing costs through OrtEpGraphSupportInfo is clean, and the added tests cover the ad-hoc/stats accountant paths plus CUDA-plugin end-to-end behavior nicely.
I found two issues in the host-side plumbing that converts plugin-reported costs back into ComputeCapability objects:
1. Fused plugin capabilities bypass resource accounting entirely (High Priority)
The cost-attachment code only runs in the kSingleAssignedNode branch in ep_plugin_provider_interfaces.cc. The kFusedNode branch calls CreateSupportedPartitions() and forwards the result without attaching IResourceAccountant or cost metadata to the IndexedSubGraph.
This means any plugin EP that uses EpGraphSupportInfo_AddNodesToFuse() while resource accounting is enabled will produce ComputeCapability objects with no budget tracking. GraphPartitioner::AccountForAllNodes() will be silently bypassed for those fused partitions, making budget enforcement incomplete.
Since existing plugin EPs already use AddNodesToFuse(), this is a real gap in the new API surface.
2. OrtResourceCountKind_None sentinel is unusable (Suggestion)
EpGraphSupportInfo_ReportAcceptedNodeCost() in ep_api.cc explicitly accepts OrtResourceCountKind_None, and the public header documents it as the zero-cost sentinel. However, the host-side conversion in ep_plugin_provider_interfaces.cc (the switch on ort_cost.kind) falls through OrtResourceCountKind_None into the default branch and logs "Unknown OrtResourceCountKind", skipping cost attachment entirely. This makes the documented sentinel produce a warning and silently disables accounting for that node.
Consider handling OrtResourceCountKind_None explicitly by appending a zero cost and attaching the accountant.
| # | Severity | Component | Issue |
|---|---|---|---|
| 1 | High | Plugin cost propagation | Reported costs only attached for AddSingleNode() capabilities; fused plugin capabilities bypass accounting entirely |
| 2 | Suggestion | OrtResourceCount semantics | OrtResourceCountKind_None documented/validated as supported, but host-side conversion treats it as unknown |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tianleiwu
left a comment
There was a problem hiding this comment.
Consolidated Review — commit 5f33bf7
The host-side budget enforcement integration is well-structured: computing costs and enforcing the budget uniformly in PluginExecutionProvider::GetCapability() keeps plugin EPs focused on proposing supported nodes. The fused-capability gap from the previous review round is now addressed, and the ResetForNewPass() refactoring (Template Method pattern) is a clean improvement.
Two cross-cutting concerns remain:
⚠️ High-Priority: Accountant key mismatch renders plugin budget enforcement inactive
CreateAccountants() in resource_accountant.cc inserts the CUDA accountant under kCudaExecutionProvider ("CUDAExecutionProvider"), but PartitionOnnxFormatModel() in graph_partitioner.cc looks up the accountant via ep->Type(), which returns kCudaPluginExecutionProvider ("CudaPluginExecutionProvider") for the plugin EP. The lookup misses, so PluginExecutionProvider::GetCapability() receives resource_accountant == nullptr — silently disabling all the host-side budget enforcement added in this PR.
Suggested fix — either register a second accountant entry under kCudaPluginExecutionProvider, or translate the lookup key in the partitioner:
const auto accountant_key = ep->Type() == kCudaPluginExecutionProvider
? kCudaExecutionProvider
: ep->Type();
auto hit = acc_map->find(accountant_key);The integration test TinyBudget_NodesOffloadedToCpu likely still passes without this fix because some nodes fall to CPU due to unsupported ops, not actual budget enforcement.
Note: consumed_bytes fragility in fused-group path
When resource_accountant != nullptr but has_budget == false, the fused-group section computes and appends per-node costs but never updates the local consumed_bytes. This is safe today (budget checks are skipped when has_budget == false) but fragile: any future code that reads consumed_bytes unconditionally will see a stale value. Consider always updating consumed_bytes when costs are computed, regardless of the budget flag.
Previously resolved threads
CreateAdHocAccountantORT_THROW_IF_ERROR — now usesASSERT_STATUS_OK/ASSERT_TRUE/ASSERT_NEat current HEAD. Resolved.- Per-node log level at INFO — now uses
VERBOSEat current HEAD. Resolved.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This pull request introduces several enhancements and refactorings to the resource accounting and execution provider (EP) infrastructure, with a focus on better support for plugin-based CUDA execution providers. The most significant changes include the addition of type-erased arithmetic for resource accounting, improved handling of resource budgets for plugin EPs, and more robust device matching logic. These updates increase maintainability, enforce stricter type safety, and ensure correct resource tracking across both in-tree and plugin-based EPs.
Resource accounting improvements:
AddResourceCounts,ResourceCountExceeds,FormatResourceCount) forResourceCountto enforce exhaustive handling of variant types and improve type safety. [1] [2]IResourceAccountantinterface: replacedResetPendingWeightswithResetForNewPass, which resets both the stop flag and pending weights, and introduced a protectedResetPendingWeightsImplfor subclass-specific cleanup. [1] [2] [3] [4] [5]Plugin CUDA EP and resource budget enforcement:
kCudaPluginExecutionProviderconstant and updated logic to ensure plugin EPs correctly map to their in-tree accountant counterparts and are included in device matching and partitioning. [1] [2] [3] [4] [5] [6]Device matching and partitioning:
Other minor changes include code style cleanups and additional includes for completeness.