[Plugin] Fix bug of memcpy in scatter plugin by SilvesterHsu · Pull Request #3818 · NVIDIA/TensorRT

SilvesterHsu · 2024-04-23T13:03:48Z

In the scatter plugin, cudaMemcpy with implicit synchronization is used to complete data copying, ensuring that device_transform_coeff is properly assigned before the kernel execution. However, this method fails when using cudaStreamNonBlocking stream for inference in TensorRT, resulting in incorrect outcomes. This issue can be resolved by switching to cudaMemcpyAsync and using the same stream as the kernel, yielding correct results.

Merge release/10.0 to main

Signed-off-by: seel.xu <[email protected]>

asfiyab-nvidia and others added 2 commits April 3, 2024 14:43

Merge pull request NVIDIA#3773 from NVIDIA/release/10.0

5eeb6c7

Merge release/10.0 to main

[Plugin] Fix bug of memcpy in scatter plugin

05af502

Signed-off-by: seel.xu <[email protected]>

kevinch-nv force-pushed the main branch from 40efe7e to 2114dc7 Compare July 9, 2025 17:14

kevinch-nv requested a review from a team as a code owner July 9, 2025 17:14

kevinch-nv requested review from LeoZDong and kevinch-nv and removed request for a team July 9, 2025 17:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Plugin] Fix bug of memcpy in scatter plugin#3818

[Plugin] Fix bug of memcpy in scatter plugin#3818
SilvesterHsu wants to merge 2 commits intoNVIDIA:mainfrom
SilvesterHsu:hotfix_plugin_stream

SilvesterHsu commented Apr 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SilvesterHsu commented Apr 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants