fix: preserve multi_modal_data in generate_opt_level=0 path by sanmuf · Pull Request #446 · alibaba/ROLL

sanmuf · 2026-05-19T09:42:42Z

Summary

This PR fixes VLM generation in generate_scheduler.py by preserving multi_modal_data when constructing gen_batch.

Previously, request_data.pop(...) only kept tensor fields such as input_ids, attention_mask, and position_ids. For multimodal generation, multi_modal_data is stored in non_tensor_batch, so it was dropped before calling actor_cluster.generate(...).

As a result, VLM backends such as vLLM/SGLang could receive text-only prompts without image payloads.

For Qwen3-VL in vLLM, this could fail during M-RoPE position initialization with an error like:

IndexError: list index out of range

File ".../vllm/model_executor/models/qwen3_vl.py", line 1521, in get_mrope_input_positions
image_grid_thw[image_index][0]

Changes

Preserve multi_modal_data from request_data.non_tensor_batch when building gen_batch.
Keep behavior unchanged for text-only models.

Why

For VLM models, multi_modal_data contains image payloads and prompt token ids required by generation backends. Dropping it causes multimodal sampling to fail or silently degrade to text-only generation.

Test

Verified with Qwen3-VL RLVR pipeline using generate_opt_level: 0.
Confirmed that multi_modal_data is preserved and passed to the generation backend.

CLAassistant · 2026-05-19T09:43:27Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

fusen seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

fix image mismatch

869a946

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: preserve multi_modal_data in generate_opt_level=0 path#446

fix: preserve multi_modal_data in generate_opt_level=0 path#446
sanmuf wants to merge 1 commit into
alibaba:mainfrom
sanmuf:rl-vlm

sanmuf commented May 19, 2026

Uh oh!

CLAassistant commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sanmuf commented May 19, 2026

Summary

Changes

Why

Test

Uh oh!

CLAassistant commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants