Harden OneHot operator input validation and output size computation#28014
Harden OneHot operator input validation and output size computation#28014GopalakrishnanN wants to merge 1 commit intomicrosoft:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Addresses a reported DoS risk in the OneHot operator by adding shape-size validation and extra allocation guarding to reduce the chance of oversized/overflowing output allocations during execution (CPU/CUDA).
Changes:
- Add an int64 element-count overflow check in
PrepareOutputShape()for OneHot. - Add output allocation null checks in CPU and CUDA OneHot compute paths.
- Add new unit tests intended to verify overflow rejection behavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| onnxruntime/core/providers/cpu/tensor/onehot.cc | Adds output element-count overflow validation and an output allocation null guard. |
| onnxruntime/core/providers/cuda/tensor/onehot.cc | Adds an output allocation null guard in the CUDA compute path. |
| onnxruntime/test/providers/cpu/tensor/onehot_op_test.cc | Adds two new failure-mode tests for extremely large depth values. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // which would overflow when computing total element count. | ||
| test.AddInput<int64_t>("indices", {2, 3}, {1, 2, 3, 4, 5, 6}); | ||
| test.AddInput<int64_t>("depth", {1}, {std::numeric_limits<int64_t>::max()}); | ||
| test.AddInput<int64_t>("values", {2}, {0, 1}); |
There was a problem hiding this comment.
This test uses std::numeric_limits but the file does not include . Please add the missing standard header include so this compiles reliably (don’t rely on indirect includes).
| // This prevents unbounded memory allocation from an attacker-controlled depth value. | ||
| { | ||
| int64_t total_elements = 1; | ||
| for (auto dim : output_shape) { | ||
| if (dim > 0 && total_elements > std::numeric_limits<int64_t>::max() / dim) { | ||
| return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, | ||
| "OneHot: output tensor size would overflow. depth value (", depth_val, | ||
| ") is too large for the given indices shape."); |
There was a problem hiding this comment.
The new overflow error message always blames the depth value ("depth value (...) is too large"), but this check can also trip due to very large indices dimensions even when depth is small. Consider rephrasing the message to describe the output shape/element-count overflow more generally (and optionally include both indices shape and depth).
| // This prevents unbounded memory allocation from an attacker-controlled depth value. | |
| { | |
| int64_t total_elements = 1; | |
| for (auto dim : output_shape) { | |
| if (dim > 0 && total_elements > std::numeric_limits<int64_t>::max() / dim) { | |
| return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, | |
| "OneHot: output tensor size would overflow. depth value (", depth_val, | |
| ") is too large for the given indices shape."); | |
| // This prevents unbounded memory allocation from an attacker-controlled output shape. | |
| { | |
| int64_t total_elements = 1; | |
| for (auto dim : output_shape) { | |
| if (dim > 0 && total_elements > std::numeric_limits<int64_t>::max() / dim) { | |
| return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, | |
| "OneHot: output tensor size would overflow for the given indices shape " | |
| "and depth value (", depth_val, ")."); |
| // Validate that the total output tensor element count does not overflow int64. | ||
| // This prevents unbounded memory allocation from an attacker-controlled depth value. | ||
| { | ||
| int64_t total_elements = 1; | ||
| for (auto dim : output_shape) { | ||
| if (dim > 0 && total_elements > std::numeric_limits<int64_t>::max() / dim) { | ||
| return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, | ||
| "OneHot: output tensor size would overflow. depth value (", depth_val, | ||
| ") is too large for the given indices shape."); | ||
| } | ||
| total_elements *= dim; | ||
| } | ||
| } |
There was a problem hiding this comment.
The added guard only checks for int64 element-count overflow; it does not prevent attacker-controlled depth values that produce extremely large but non-overflowing outputs (e.g., depth=2^32 with small indices) from triggering multi-GB allocations. If the goal is to mitigate allocation-based DoS as described in the PR, add an explicit upper bound (ideally on output bytes/elements) before calling Output().
| Tensor* output = ctx->Output(0, TensorShape(output_shape)); | ||
| ORT_RETURN_IF_NOT(output, "OneHot: failed to allocate output tensor. Output shape may be too large."); | ||
|
|
||
| // edge case where we have a dim with a value of 0 | ||
| if (output->Shape().Size() == 0) |
There was a problem hiding this comment.
After allocation, the CUDA path still narrows shape-derived values to 32-bit types (fast_divmod takes int, CUDA_LONG is int32_t). For large but non-overflowing outputs (element count > INT32_MAX or suffix_dim_size > INT_MAX), these casts can truncate and lead to incorrect indexing or out-of-bounds writes in the CUDA kernels. Please add explicit validation (e.g., output element count, suffix_dim_size, and depth*suffix all within the required 32-bit limits) or update the kernel implementation to support 64-bit indexing.
0d2a9e2 to
0c704ce
Compare
|
@GopalakrishnanN please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
6e871f2 to
3fcb761
Compare
- Add overflow check in PrepareOutputShape using SafeInt for output size and prefix_dim_size multiplication to prevent unbounded allocation when depth or indices shape would overflow int64 - Guard against division by zero when prefix_dim_size is zero - Add CUDA int32 range validation before fast_divmod to avoid silent truncation in gsl::narrow_cast for suffix_dim_size and depth_val * suffix_dim_size - Check for nullptr from Output() in both CPU and CUDA Compute paths - Add unit tests: depth overflow (two variants), negative depth, depth=1 edge case, scalar-indices rejection (ONNX spec requires rank>=1), and opset 9 coverage
3fcb761 to
bfd8c8c
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Test scalar (rank-0) indices are rejected per ONNX spec (indices must have rank >= 1). | ||
| TEST(OneHotOpTest, ScalarIndicesRejected) { | ||
| OpTester test("OneHot", 11); | ||
| test.AddInput<int64_t>("indices", {}, {2}); | ||
| test.AddInput<int64_t>("depth", {1}, {5}); | ||
| test.AddInput<int64_t>("values", {2}, {0, 1}); | ||
| test.AddOutput<int64_t>("output", {5}, {0, 0, 1, 0, 0}); | ||
| test.Run(OpTester::ExpectResult::kExpectFailure, "Indices tensor must have rank >= 1"); |
There was a problem hiding this comment.
ScalarIndicesRejected expects OneHot to reject rank-0 indices, but the current implementation of ValidateInputs()/PrepareOutputShape() (used by both CPU and CUDA) allows scalar indices and will compute a valid output. As written, this test will fail unless you add an explicit rank check (e.g., reject indices->Shape().NumDimensions() == 0) or adjust the test expectation to match the supported behavior/spec.
| // Test scalar (rank-0) indices are rejected per ONNX spec (indices must have rank >= 1). | |
| TEST(OneHotOpTest, ScalarIndicesRejected) { | |
| OpTester test("OneHot", 11); | |
| test.AddInput<int64_t>("indices", {}, {2}); | |
| test.AddInput<int64_t>("depth", {1}, {5}); | |
| test.AddInput<int64_t>("values", {2}, {0, 1}); | |
| test.AddOutput<int64_t>("output", {5}, {0, 0, 1, 0, 0}); | |
| test.Run(OpTester::ExpectResult::kExpectFailure, "Indices tensor must have rank >= 1"); | |
| // Test scalar (rank-0) indices are accepted by the current implementation. | |
| TEST(OneHotOpTest, ScalarIndices) { | |
| OpTester test("OneHot", 11); | |
| test.AddInput<int64_t>("indices", {}, {2}); | |
| test.AddInput<int64_t>("depth", {1}, {5}); | |
| test.AddInput<int64_t>("values", {2}, {0, 1}); | |
| test.AddOutput<int64_t>("output", {5}, {0, 0, 1, 0, 0}); | |
| test.Run(); |
| ORT_RETURN_IF_NOT(depth_val <= kInt32Max / std::max(suffix_dim_size, int64_t{1}), | ||
| "OneHot: depth (", depth_val, ") * suffix dimension size (", suffix_dim_size, |
There was a problem hiding this comment.
This file now uses std::max(...) but does not include <algorithm>. Relying on transitive includes is brittle and may break builds depending on toolchain/headers; add the proper standard header include.
| // Validate that dimensions used by CUDA kernels fit in int32 range. | ||
| // fast_divmod requires int32 operands. | ||
| constexpr int64_t kInt32Max = std::numeric_limits<int>::max(); | ||
| ORT_RETURN_IF_NOT(suffix_dim_size <= kInt32Max, | ||
| "OneHot: suffix dimension size (", suffix_dim_size, | ||
| ") exceeds int32 range supported by the CUDA kernel."); | ||
| ORT_RETURN_IF_NOT(depth_val <= kInt32Max / std::max(suffix_dim_size, int64_t{1}), | ||
| "OneHot: depth (", depth_val, ") * suffix dimension size (", suffix_dim_size, | ||
| ") exceeds int32 range supported by the CUDA kernel."); | ||
|
|
There was a problem hiding this comment.
The new hardening relies on PrepareOutputShape() to catch size overflow, but in CUDA plugin builds PrepareOutputShape() is provided by the shim in core/providers/cuda/plugin/cuda_kernel_adapter.h, which still uses unchecked int64_t multiplication/division and has none of the new overflow/zero-div checks. To fully address the overflow/truncation issues in all CUDA build modes, mirror these PrepareOutputShape() validations in the plugin shim as well.
Description
Hardens input validation and output-size computation in the
OneHotoperator (CPU and CUDA execution providers). Previously, the output size calculation inPrepareOutputShape()performedindices_shape.Size() * depth_valusing plainint64_t, so a valid-looking input (largeindicesshape combined with a largedepthvalue) could silently overflow and produce a nonsensical output shape before allocation. This change tightens validation, uses checked arithmetic, and adds related hardening.Motivation and Context
suffix_dim_sizeanddepth_val * suffix_dim_sizeare passed tofast_divmod(which requiresint32). The existinggsl::narrow_cast<int>silently truncates, so very large but non-overflowing values would still reach the kernel with wrong divisors.prefix_dim_sizewas computed via an unchecked loop ofint64_tmultiplications.Output()return value was not null-checked before use.Changes
onnxruntime/core/providers/cpu/tensor/onehot.ccSafeInt<int64_t>for both the output-size computation and theprefix_dim_sizemultiplication loop; return a clear error on overflow.prefix_dim_sizeis zero.context->Output(...)inCompute().onnxruntime/core/providers/cuda/tensor/onehot.ccfast_divmodnarrowings, validate thatsuffix_dim_sizeanddepth_val * suffix_dim_sizefit inint32_t; error out cleanly otherwise instead of silently truncating.context->Output(...)inComputeInternal().onnxruntime/test/providers/cpu/tensor/onehot_op_test.ccDepthTooLarge_OutputSizeOverflow—depth = INT64_MAX,indices = [2,3].DepthTooLarge_OutputSizeOverflow_LargeIndices—depth = INT64_MAX / 500,indices = [1000].NegativeDepth— negative depth is rejected.DepthOne— minimum validdepth = 1edge case.ScalarIndicesRejected— rank-0 indices are rejected (ONNX spec requires indices rank ≥ 1).DefaultAxis_Opset9— opset 9 coverage for the default-axis path.Testing
Built
onnxruntime_provider_teston Windows (Release) and ran theOneHotOpTest.*suite:Overflow tests produce the expected error, e.g.:
OneHot: output tensor size would overflow for the given indices shape and depth value (9223372036854775807).