Skip to content

[Plugin] Fix bug of memcpy in scatter plugin#3818

Open
SilvesterHsu wants to merge 2 commits intoNVIDIA:mainfrom
SilvesterHsu:hotfix_plugin_stream
Open

[Plugin] Fix bug of memcpy in scatter plugin#3818
SilvesterHsu wants to merge 2 commits intoNVIDIA:mainfrom
SilvesterHsu:hotfix_plugin_stream

Conversation

@SilvesterHsu
Copy link

In the scatter plugin, cudaMemcpy with implicit synchronization is used to complete data copying, ensuring that device_transform_coeff is properly assigned before the kernel execution. However, this method fails when using cudaStreamNonBlocking stream for inference in TensorRT, resulting in incorrect outcomes. This issue can be resolved by switching to cudaMemcpyAsync and using the same stream as the kernel, yielding correct results.

@kevinch-nv kevinch-nv requested a review from a team as a code owner July 9, 2025 17:14
@kevinch-nv kevinch-nv requested review from LeoZDong and kevinch-nv and removed request for a team July 9, 2025 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants