Fix vLLM >= 0.17 compatibility: migrate to native WeightTransferConfig API#3556
Fix vLLM >= 0.17 compatibility: migrate to native WeightTransferConfig API#3556vmoens wants to merge 2 commits intogh/vmoens/240/basefrom
Conversation
…g API
- Replace manual stateless_init_process_group + collective_rpc("update_weight")
with vLLM's native WeightTransferConfig/NCCLWeightTransferEngine API
- Fix VLLM_USE_V1 env var removal (V1 always on in 0.17+)
- Fix NCCL weight sync deadlock by dispatching worker RPCs before trainer joins
- Fix LoRA weight extraction (merge_and_unload before state_dict)
- Fix weight transfer KeyError by using HF model directly (not TransformersWrapper)
- Fix prompt_logprobs length mismatch in _RequestOutput_tc for V1 engine
- Auto-propagate WANDB_API_KEY, HF_TOKEN, HF_HOME to Ray workers
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ghstack-source-id: 1a2d958
Pull-Request: #3556
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3556
Note: Links to docs will display an error until the docs builds have been completed. ❌ 4 New Failures, 3 Unrelated FailuresAs of commit d0fb2a9 with merge base 4e2e787 ( NEW FAILURES - The following jobs have failed:
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
| Prefix | Label Applied | Example |
|---|---|---|
[BugFix] |
BugFix | [BugFix] Fix memory leak in collector |
[Feature] |
Feature | [Feature] Add new optimizer |
[Doc] or [Docs] |
Documentation | [Doc] Update installation guide |
[Refactor] |
Refactoring | [Refactor] Clean up module imports |
[CI] |
CI | [CI] Fix workflow permissions |
[Test] or [Tests] |
Tests | [Tests] Add unit tests for buffer |
[Environment] or [Environments] |
Environments | [Environments] Add Gymnasium support |
[Data] |
Data | [Data] Fix replay buffer sampling |
[Performance] or [Perf] |
Performance | [Performance] Optimize tensor ops |
[BC-Breaking] |
bc breaking | [BC-Breaking] Remove deprecated API |
[Deprecation] |
Deprecation | [Deprecation] Mark old function |
[Quality] |
Quality | [Quality] Fix typos and add codespell |
Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).
|
| Prefix | Label Applied | Example |
|---|---|---|
[BugFix] |
BugFix | [BugFix] Fix memory leak in collector |
[Feature] |
Feature | [Feature] Add new optimizer |
[Doc] or [Docs] |
Documentation | [Doc] Update installation guide |
[Refactor] |
Refactoring | [Refactor] Clean up module imports |
[CI] |
CI | [CI] Fix workflow permissions |
[Test] or [Tests] |
Tests | [Tests] Add unit tests for buffer |
[Environment] or [Environments] |
Environments | [Environments] Add Gymnasium support |
[Data] |
Data | [Data] Fix replay buffer sampling |
[Performance] or [Perf] |
Performance | [Performance] Optimize tensor ops |
[BC-Breaking] |
bc breaking | [BC-Breaking] Remove deprecated API |
[Deprecation] |
Deprecation | [Deprecation] Mark old function |
[Quality] |
Quality | [Quality] Fix typos and add codespell |
Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).
|
| Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
|---|---|---|---|---|---|
| test_tensor_to_bytestream_speed[pickle] | 81.8339μs | 80.5935μs | 12.4080 KOps/s | 12.4851 KOps/s | |
| test_tensor_to_bytestream_speed[torch.save] | 0.1397ms | 0.1386ms | 7.2140 KOps/s | 7.1840 KOps/s | |
| test_tensor_to_bytestream_speed[untyped_storage] | 0.1108s | 0.1105s | 9.0513 Ops/s | 9.0348 Ops/s | |
| test_tensor_to_bytestream_speed[numpy] | 2.5482μs | 2.5435μs | 393.1643 KOps/s | 395.2466 KOps/s | |
| test_tensor_to_bytestream_speed[safetensors] | 36.9283μs | 36.6334μs | 27.2975 KOps/s | 26.0877 KOps/s | |
| test_simple | 0.7826s | 0.7817s | 1.2793 Ops/s | 1.2371 Ops/s | |
| test_transformed | 1.3755s | 1.3741s | 0.7278 Ops/s | 0.7120 Ops/s | |
| test_serial | 2.4327s | 2.3196s | 0.4311 Ops/s | 0.4200 Ops/s | |
| test_parallel | 1.9245s | 1.8229s | 0.5486 Ops/s | 0.5519 Ops/s | |
| test_step_mdp_speed[True-True-True-True-True] | 0.1843ms | 42.3945μs | 23.5880 KOps/s | 23.9642 KOps/s | |
| test_step_mdp_speed[True-True-True-True-False] | 57.1610μs | 23.0989μs | 43.2921 KOps/s | 43.3240 KOps/s | |
| test_step_mdp_speed[True-True-True-False-True] | 73.4420μs | 23.5847μs | 42.4003 KOps/s | 43.9052 KOps/s | |
| test_step_mdp_speed[True-True-True-False-False] | 42.5800μs | 12.7772μs | 78.2643 KOps/s | 77.3807 KOps/s | |
| test_step_mdp_speed[True-True-False-True-True] | 83.2520μs | 44.2352μs | 22.6064 KOps/s | 22.4330 KOps/s | |
| test_step_mdp_speed[True-True-False-True-False] | 58.6510μs | 25.5230μs | 39.1804 KOps/s | 39.5892 KOps/s | |
| test_step_mdp_speed[True-True-False-False-True] | 59.8010μs | 25.8073μs | 38.7487 KOps/s | 38.6982 KOps/s | |
| test_step_mdp_speed[True-True-False-False-False] | 53.7410μs | 15.4572μs | 64.6949 KOps/s | 65.1556 KOps/s | |
| test_step_mdp_speed[True-False-True-True-True] | 0.1179ms | 46.8276μs | 21.3549 KOps/s | 20.9731 KOps/s | |
| test_step_mdp_speed[True-False-True-True-False] | 67.7110μs | 28.1462μs | 35.5288 KOps/s | 35.6843 KOps/s | |
| test_step_mdp_speed[True-False-True-False-True] | 54.9710μs | 26.1861μs | 38.1882 KOps/s | 37.5558 KOps/s | |
| test_step_mdp_speed[True-False-True-False-False] | 40.6210μs | 15.2478μs | 65.5832 KOps/s | 64.8506 KOps/s | |
| test_step_mdp_speed[True-False-False-True-True] | 84.5620μs | 49.1386μs | 20.3506 KOps/s | 20.0432 KOps/s | |
| test_step_mdp_speed[True-False-False-True-False] | 68.8620μs | 30.0952μs | 33.2279 KOps/s | 32.5721 KOps/s | |
| test_step_mdp_speed[True-False-False-False-True] | 60.6010μs | 28.3195μs | 35.3114 KOps/s | 35.1444 KOps/s | |
| test_step_mdp_speed[True-False-False-False-False] | 58.6310μs | 17.6406μs | 56.6873 KOps/s | 55.4042 KOps/s | |
| test_step_mdp_speed[False-True-True-True-True] | 87.0910μs | 46.6662μs | 21.4288 KOps/s | 20.8881 KOps/s | |
| test_step_mdp_speed[False-True-True-True-False] | 57.2010μs | 27.5472μs | 36.3014 KOps/s | 35.8694 KOps/s | |
| test_step_mdp_speed[False-True-True-False-True] | 2.4869ms | 29.9273μs | 33.4143 KOps/s | 33.5495 KOps/s | |
| test_step_mdp_speed[False-True-True-False-False] | 45.9200μs | 17.0627μs | 58.6072 KOps/s | 59.3039 KOps/s | |
| test_step_mdp_speed[False-True-False-True-True] | 86.2610μs | 48.8095μs | 20.4878 KOps/s | 20.1449 KOps/s | |
| test_step_mdp_speed[False-True-False-True-False] | 66.1710μs | 30.1817μs | 33.1327 KOps/s | 32.5591 KOps/s | |
| test_step_mdp_speed[False-True-False-False-True] | 63.0810μs | 32.0491μs | 31.2021 KOps/s | 31.1537 KOps/s | |
| test_step_mdp_speed[False-True-False-False-False] | 50.4610μs | 19.3231μs | 51.7515 KOps/s | 51.9095 KOps/s | |
| test_step_mdp_speed[False-False-True-True-True] | 90.5710μs | 51.3817μs | 19.4622 KOps/s | 19.2463 KOps/s | |
| test_step_mdp_speed[False-False-True-True-False] | 71.8610μs | 32.8089μs | 30.4796 KOps/s | 30.2437 KOps/s | |
| test_step_mdp_speed[False-False-True-False-True] | 68.0510μs | 31.6025μs | 31.6431 KOps/s | 31.2318 KOps/s | |
| test_step_mdp_speed[False-False-True-False-False] | 61.5710μs | 18.9732μs | 52.7059 KOps/s | 52.0746 KOps/s | |
| test_step_mdp_speed[False-False-False-True-True] | 90.1110μs | 54.0174μs | 18.5125 KOps/s | 18.2681 KOps/s | |
| test_step_mdp_speed[False-False-False-True-False] | 95.5120μs | 34.6283μs | 28.8781 KOps/s | 28.1220 KOps/s | |
| test_step_mdp_speed[False-False-False-False-True] | 53.8610μs | 33.5677μs | 29.7905 KOps/s | 29.2696 KOps/s | |
| test_step_mdp_speed[False-False-False-False-False] | 44.7010μs | 21.8044μs | 45.8623 KOps/s | 46.0767 KOps/s | |
| test_non_tensor_env_rollout_speed[1000-single-True] | 0.8500s | 0.7414s | 1.3489 Ops/s | 1.3469 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-single-False] | 0.7058s | 0.6065s | 1.6488 Ops/s | 1.6337 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] | 1.7328s | 1.6455s | 0.6077 Ops/s | 0.6024 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] | 1.5090s | 1.4277s | 0.7004 Ops/s | 0.6997 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-True] | 1.9827s | 1.8982s | 0.5268 Ops/s | 0.5270 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-False] | 1.7699s | 1.6773s | 0.5962 Ops/s | 0.5974 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] | 4.7064s | 4.5849s | 0.2181 Ops/s | 0.2164 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] | 4.5414s | 4.4057s | 0.2270 Ops/s | 0.2244 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] | 1.9688s | 1.8767s | 0.5329 Ops/s | 0.5356 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] | 1.6743s | 1.6050s | 0.6230 Ops/s | 0.6331 Ops/s | |
| test_values[generalized_advantage_estimate-True-True] | 21.3523ms | 20.6778ms | 48.3610 Ops/s | 47.1374 Ops/s | |
| test_values[vec_generalized_advantage_estimate-True-True] | 0.1392s | 3.7061ms | 269.8230 Ops/s | 283.9737 Ops/s | |
| test_values[td0_return_estimate-False-False] | 0.1025ms | 82.2431μs | 12.1591 KOps/s | 12.1810 KOps/s | |
| test_values[td1_return_estimate-False-False] | 48.4830ms | 48.0827ms | 20.7975 Ops/s | 20.3764 Ops/s | |
| test_values[vec_td1_return_estimate-False-False] | 1.3482ms | 1.0898ms | 917.5997 Ops/s | 913.3728 Ops/s | |
| test_values[td_lambda_return_estimate-True-False] | 79.5740ms | 78.8964ms | 12.6749 Ops/s | 12.4395 Ops/s | |
| test_values[vec_td_lambda_return_estimate-True-False] | 1.2950ms | 1.0869ms | 920.0286 Ops/s | 913.7384 Ops/s | |
| test_gae_speed[generalized_advantage_estimate-False-1-512] | 20.9997ms | 20.7025ms | 48.3034 Ops/s | 48.1498 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 1.0118ms | 0.7566ms | 1.3218 KOps/s | 1.3112 KOps/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.8047ms | 0.6810ms | 1.4685 KOps/s | 1.4741 KOps/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 1.5264ms | 1.4890ms | 671.5696 Ops/s | 672.4670 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 0.7346ms | 0.6953ms | 1.4381 KOps/s | 1.4065 KOps/s | |
| test_dqn_speed[False-None] | 1.6993ms | 1.6032ms | 623.7463 Ops/s | 627.4638 Ops/s | |
| test_dqn_speed[False-backward] | 2.3183ms | 2.2495ms | 444.5479 Ops/s | 443.5572 Ops/s | |
| test_dqn_speed[True-None] | 0.6703ms | 0.5761ms | 1.7359 KOps/s | 1.6993 KOps/s | |
| test_dqn_speed[True-backward] | 1.2649ms | 1.2177ms | 821.2259 Ops/s | 819.4070 Ops/s | |
| test_dqn_speed[reduce-overhead-None] | 0.7399ms | 0.6089ms | 1.6424 KOps/s | 1.6228 KOps/s | |
| test_ddpg_speed[False-None] | 3.4689ms | 3.0429ms | 328.6364 Ops/s | 329.8233 Ops/s | |
| test_ddpg_speed[False-backward] | 4.9782ms | 4.5039ms | 222.0294 Ops/s | 223.6785 Ops/s | |
| test_ddpg_speed[True-None] | 1.4222ms | 1.3370ms | 747.9668 Ops/s | 742.9018 Ops/s | |
| test_ddpg_speed[True-backward] | 2.3753ms | 2.3250ms | 430.1056 Ops/s | 400.9642 Ops/s | |
| test_ddpg_speed[reduce-overhead-None] | 1.4560ms | 1.3657ms | 732.2456 Ops/s | 724.7545 Ops/s | |
| test_sac_speed[False-None] | 8.9579ms | 8.5364ms | 117.1457 Ops/s | 117.4731 Ops/s | |
| test_sac_speed[False-backward] | 12.1395ms | 11.6228ms | 86.0376 Ops/s | 84.5686 Ops/s | |
| test_sac_speed[True-None] | 2.3281ms | 1.8458ms | 541.7841 Ops/s | 528.2851 Ops/s | |
| test_sac_speed[True-backward] | 3.5299ms | 3.4248ms | 291.9855 Ops/s | 273.0737 Ops/s | |
| test_sac_speed[reduce-overhead-None] | 16.9107ms | 10.1890ms | 98.1450 Ops/s | 98.0442 Ops/s | |
| test_redq_deprec_speed[False-None] | 10.6393ms | 9.5352ms | 104.8747 Ops/s | 104.4890 Ops/s | |
| test_redq_deprec_speed[False-backward] | 13.2051ms | 12.6858ms | 78.8285 Ops/s | 76.7313 Ops/s | |
| test_redq_deprec_speed[True-None] | 2.6555ms | 2.5374ms | 394.1020 Ops/s | 383.9178 Ops/s | |
| test_redq_deprec_speed[True-backward] | 4.0549ms | 3.9871ms | 250.8060 Ops/s | 233.8570 Ops/s | |
| test_redq_deprec_speed[reduce-overhead-None] | 14.8212ms | 9.6522ms | 103.6029 Ops/s | 103.6334 Ops/s | |
| test_td3_speed[False-None] | 8.5183ms | 8.3927ms | 119.1512 Ops/s | 117.7044 Ops/s | |
| test_td3_speed[False-backward] | 11.3962ms | 10.8319ms | 92.3196 Ops/s | 91.7576 Ops/s | |
| test_td3_speed[True-None] | 1.6499ms | 1.6171ms | 618.3918 Ops/s | 582.9998 Ops/s | |
| test_td3_speed[True-backward] | 3.0798ms | 2.9478ms | 339.2308 Ops/s | 317.8524 Ops/s | |
| test_td3_speed[reduce-overhead-None] | 98.2653ms | 25.7436ms | 38.8446 Ops/s | 38.6384 Ops/s | |
| test_cql_speed[False-None] | 18.4699ms | 17.8094ms | 56.1501 Ops/s | 56.3456 Ops/s | |
| test_cql_speed[False-backward] | 23.6310ms | 23.1680ms | 43.1629 Ops/s | 42.4847 Ops/s | |
| test_cql_speed[True-None] | 3.5518ms | 3.4346ms | 291.1539 Ops/s | 303.6496 Ops/s | |
| test_cql_speed[True-backward] | 5.9011ms | 5.4961ms | 181.9480 Ops/s | 177.9555 Ops/s | |
| test_cql_speed[reduce-overhead-None] | 17.8105ms | 11.8815ms | 84.1642 Ops/s | 83.8894 Ops/s | |
| test_a2c_speed[False-None] | 3.6040ms | 3.4720ms | 288.0186 Ops/s | 295.7474 Ops/s | |
| test_a2c_speed[False-backward] | 6.9608ms | 6.4364ms | 155.3663 Ops/s | 150.9679 Ops/s | |
| test_a2c_speed[True-None] | 1.4550ms | 1.3596ms | 735.5352 Ops/s | 725.7821 Ops/s | |
| test_a2c_speed[True-backward] | 3.2742ms | 3.1327ms | 319.2105 Ops/s | 330.2158 Ops/s | |
| test_a2c_speed[reduce-overhead-None] | 1.1751ms | 1.0420ms | 959.6825 Ops/s | 955.1133 Ops/s | |
| test_ppo_speed[False-None] | 4.3216ms | 4.1711ms | 239.7454 Ops/s | 247.6136 Ops/s | |
| test_ppo_speed[False-backward] | 7.8966ms | 7.5438ms | 132.5596 Ops/s | 136.9900 Ops/s | |
| test_ppo_speed[True-None] | 1.7350ms | 1.5060ms | 664.0306 Ops/s | 664.9134 Ops/s | |
| test_ppo_speed[True-backward] | 3.3857ms | 3.3203ms | 301.1769 Ops/s | 295.8596 Ops/s | |
| test_ppo_speed[reduce-overhead-None] | 1.2418ms | 1.0951ms | 913.1178 Ops/s | 892.5180 Ops/s | |
| test_reinforce_speed[False-None] | 2.7625ms | 2.4393ms | 409.9515 Ops/s | 398.9212 Ops/s | |
| test_reinforce_speed[False-backward] | 3.9984ms | 3.5778ms | 279.5026 Ops/s | 287.1583 Ops/s | |
| test_reinforce_speed[True-None] | 1.4860ms | 1.3659ms | 732.0985 Ops/s | 723.7706 Ops/s | |
| test_reinforce_speed[True-backward] | 3.5772ms | 3.1502ms | 317.4410 Ops/s | 332.1191 Ops/s | |
| test_reinforce_speed[reduce-overhead-None] | 16.0974ms | 9.0548ms | 110.4382 Ops/s | 111.9383 Ops/s | |
| test_iql_speed[False-None] | 11.0086ms | 10.0238ms | 99.7627 Ops/s | 102.3746 Ops/s | |
| test_iql_speed[False-backward] | 14.5950ms | 13.9208ms | 71.8348 Ops/s | 73.6738 Ops/s | |
| test_iql_speed[True-None] | 2.4328ms | 2.2587ms | 442.7365 Ops/s | 441.1823 Ops/s | |
| test_iql_speed[True-backward] | 4.9970ms | 4.9035ms | 203.9364 Ops/s | 208.9675 Ops/s | |
| test_iql_speed[reduce-overhead-None] | 17.0540ms | 10.1268ms | 98.7475 Ops/s | 100.1302 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 6.2192ms | 5.9772ms | 167.3015 Ops/s | 166.8570 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.9283ms | 0.3372ms | 2.9659 KOps/s | 2.9510 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.5373ms | 0.2738ms | 3.6517 KOps/s | 2.9664 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 5.9926ms | 5.7596ms | 173.6217 Ops/s | 171.8015 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 2.2314ms | 0.3269ms | 3.0590 KOps/s | 3.3435 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.6055ms | 0.2815ms | 3.5522 KOps/s | 3.7199 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] | 1.6088ms | 1.2805ms | 780.9574 Ops/s | 786.7949 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] | 1.4157ms | 1.2049ms | 829.9324 Ops/s | 833.6848 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 6.1337ms | 5.9747ms | 167.3726 Ops/s | 168.3295 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 0.9468ms | 0.4416ms | 2.2646 KOps/s | 1.9300 KOps/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.8606ms | 0.4223ms | 2.3678 KOps/s | 2.0288 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 5.9516ms | 5.7820ms | 172.9500 Ops/s | 172.5008 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.6606ms | 0.2871ms | 3.4828 KOps/s | 3.3460 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.5967ms | 0.3680ms | 2.7174 KOps/s | 3.1687 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 5.8965ms | 5.6971ms | 175.5283 Ops/s | 173.9889 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 1.2080ms | 0.3515ms | 2.8447 KOps/s | 2.8574 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.5438ms | 0.3020ms | 3.3110 KOps/s | 3.3390 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 6.0128ms | 5.8365ms | 171.3345 Ops/s | 165.7385 Ops/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.6737ms | 0.5021ms | 1.9918 KOps/s | 1.8615 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.7089ms | 0.4856ms | 2.0595 KOps/s | 1.9266 KOps/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 0.9615s | 24.2078ms | 41.3090 Ops/s | 195.4822 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 9.7995ms | 1.9977ms | 500.5750 Ops/s | 502.5160 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 9.9517ms | 1.3139ms | 761.1102 Ops/s | 1.0190 KOps/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 6.9212ms | 5.0648ms | 197.4416 Ops/s | 198.1174 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 3.9904ms | 1.8347ms | 545.0508 Ops/s | 539.7199 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 1.3794ms | 1.0162ms | 984.0611 Ops/s | 1.0392 KOps/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 7.8695ms | 5.2874ms | 189.1295 Ops/s | 186.0342 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 0.6609s | 15.3503ms | 65.1452 Ops/s | 494.3705 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 2.5871ms | 1.1935ms | 837.8456 Ops/s | 851.0172 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] | 39.6831ms | 37.9813ms | 26.3288 Ops/s | 25.8422 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] | 19.4244ms | 18.0408ms | 55.4298 Ops/s | 54.8734 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] | 43.2301ms | 39.2733ms | 25.4626 Ops/s | 24.9082 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] | 20.4316ms | 18.4916ms | 54.0787 Ops/s | 53.8905 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] | 42.7877ms | 40.9320ms | 24.4307 Ops/s | 23.9275 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] | 21.9271ms | 20.3261ms | 49.1978 Ops/s | 49.2270 Ops/s | |
| test_storage_write_lazystack[50-img_shape0-small] | 0.8561ms | 0.2184ms | 4.5792 KOps/s | 4.4237 KOps/s | |
| test_storage_write_lazystack[100-img_shape1-atari] | 1.6601ms | 1.4093ms | 709.5776 Ops/s | 709.2011 Ops/s | |
| test_storage_write_lazystack[100-img_shape2-large_img] | 2.7703ms | 2.3771ms | 420.6815 Ops/s | 436.3603 Ops/s | |
| test_storage_write_lazystack[200-img_shape3-large_batch] | 3.1347ms | 2.9365ms | 340.5367 Ops/s | 336.6963 Ops/s | |
| test_storage_write_contiguous[50-img_shape0-small] | 0.4919ms | 0.1632ms | 6.1288 KOps/s | 5.9565 KOps/s | |
| test_storage_write_contiguous[100-img_shape1-atari] | 0.3813ms | 0.2190ms | 4.5670 KOps/s | 4.3415 KOps/s | |
| test_storage_write_contiguous[100-img_shape2-large_img] | 2.0097ms | 1.7843ms | 560.4390 Ops/s | 550.1413 Ops/s | |
| test_storage_write_contiguous[200-img_shape3-large_batch] | 1.5858ms | 1.3794ms | 724.9783 Ops/s | 744.5115 Ops/s | |
| test_collector_stack_then_write[50-img_shape0-small] | 1.4658ms | 1.1493ms | 870.1236 Ops/s | 875.7571 Ops/s | |
| test_collector_stack_then_write[100-img_shape1-atari] | 3.7252ms | 3.6101ms | 276.9975 Ops/s | 278.9396 Ops/s | |
| test_collector_stack_then_write[100-img_shape2-large_img] | 6.0544ms | 5.8395ms | 171.2478 Ops/s | 169.2655 Ops/s | |
| test_collector_stack_then_write[200-img_shape3-large_batch] | 7.5864ms | 7.3828ms | 135.4497 Ops/s | 134.6698 Ops/s | |
| test_collector_lazystack_then_write[50-img_shape0-small] | 0.4142ms | 0.2772ms | 3.6073 KOps/s | 3.6240 KOps/s | |
| test_collector_lazystack_then_write[100-img_shape1-atari] | 1.7055ms | 1.5289ms | 654.0714 Ops/s | 643.6705 Ops/s | |
| test_collector_lazystack_then_write[100-img_shape2-large_img] | 2.7230ms | 2.4977ms | 400.3659 Ops/s | 399.4142 Ops/s | |
| test_collector_lazystack_then_write[200-img_shape3-large_batch] | 3.3438ms | 3.1376ms | 318.7156 Ops/s | 316.0961 Ops/s | |
| test_collector_without_rb[100-img_shape0-atari] | 33.0763ms | 32.5758ms | 30.6976 Ops/s | 30.3886 Ops/s | |
| test_collector_without_rb[200-img_shape1-large_batch] | 64.3704ms | 64.1237ms | 15.5949 Ops/s | 15.4631 Ops/s | |
| test_collector_with_rb[100-img_shape0-atari] | 38.0902ms | 37.4591ms | 26.6957 Ops/s | 26.5407 Ops/s | |
| test_collector_with_rb[200-img_shape1-large_batch] | 74.6245ms | 73.7637ms | 13.5568 Ops/s | 13.5454 Ops/s | |
| test_collector_without_rb_cuda[100-img_shape0-atari] | 55.4755ms | 55.0507ms | 18.1651 Ops/s | 17.6601 Ops/s | |
| test_collector_without_rb_cuda[200-img_shape1-large_batch] | 0.1099s | 0.1096s | 9.1279 Ops/s | 8.9213 Ops/s | |
| test_collector_with_rb_cuda[100-img_shape0-atari] | 57.4542ms | 57.1912ms | 17.4852 Ops/s | 17.3711 Ops/s | |
| test_collector_with_rb_cuda[200-img_shape1-large_batch] | 0.1141s | 0.1133s | 8.8226 Ops/s | 8.6883 Ops/s |
…g API
- Replace manual stateless_init_process_group + collective_rpc("update_weight")
with vLLM's native WeightTransferConfig/NCCLWeightTransferEngine API
- Fix VLLM_USE_V1 env var removal (V1 always on in 0.17+)
- Fix NCCL weight sync deadlock by dispatching worker RPCs before trainer joins
- Fix LoRA weight extraction (merge_and_unload before state_dict)
- Fix weight transfer KeyError by using HF model directly (not TransformersWrapper)
- Fix prompt_logprobs length mismatch in _RequestOutput_tc for V1 engine
- Auto-propagate WANDB_API_KEY, HF_TOKEN, HF_HOME to Ray workers
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ghstack-source-id: 1a2d958
Pull-Request: #3556
…g API
- Replace manual stateless_init_process_group + collective_rpc("update_weight")
with vLLM's native WeightTransferConfig/NCCLWeightTransferEngine API
- Fix VLLM_USE_V1 env var removal (V1 always on in 0.17+)
- Fix NCCL weight sync deadlock by dispatching worker RPCs before trainer joins
- Fix LoRA weight extraction (merge_and_unload before state_dict)
- Fix weight transfer KeyError by using HF model directly (not TransformersWrapper)
- Fix prompt_logprobs length mismatch in _RequestOutput_tc for V1 engine
- Auto-propagate WANDB_API_KEY, HF_TOKEN, HF_HOME to Ray workers
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ghstack-source-id: f6fb817
Pull-Request: #3556
|
| Prefix | Label Applied | Example |
|---|---|---|
[BugFix] |
BugFix | [BugFix] Fix memory leak in collector |
[Feature] |
Feature | [Feature] Add new optimizer |
[Doc] or [Docs] |
Documentation | [Doc] Update installation guide |
[Refactor] |
Refactoring | [Refactor] Clean up module imports |
[CI] |
CI | [CI] Fix workflow permissions |
[Test] or [Tests] |
Tests | [Tests] Add unit tests for buffer |
[Environment] or [Environments] |
Environments | [Environments] Add Gymnasium support |
[Data] |
Data | [Data] Fix replay buffer sampling |
[Performance] or [Perf] |
Performance | [Performance] Optimize tensor ops |
[BC-Breaking] |
bc breaking | [BC-Breaking] Remove deprecated API |
[Deprecation] |
Deprecation | [Deprecation] Mark old function |
[Quality] |
Quality | [Quality] Fix typos and add codespell |
Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).
…g API
- Replace manual stateless_init_process_group + collective_rpc("update_weight")
with vLLM's native WeightTransferConfig/NCCLWeightTransferEngine API
- Fix VLLM_USE_V1 env var removal (V1 always on in 0.17+)
- Fix NCCL weight sync deadlock by dispatching worker RPCs before trainer joins
- Fix LoRA weight extraction (merge_and_unload before state_dict)
- Fix weight transfer KeyError by using HF model directly (not TransformersWrapper)
- Fix prompt_logprobs length mismatch in _RequestOutput_tc for V1 engine
- Auto-propagate WANDB_API_KEY, HF_TOKEN, HF_HOME to Ray workers
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ghstack-source-id: f6fb817
Pull-Request: #3556
…g API
- Replace manual stateless_init_process_group + collective_rpc("update_weight")
with vLLM's native WeightTransferConfig/NCCLWeightTransferEngine API
- Fix VLLM_USE_V1 env var removal (V1 always on in 0.17+)
- Fix NCCL weight sync deadlock by dispatching worker RPCs before trainer joins
- Fix LoRA weight extraction (merge_and_unload before state_dict)
- Fix weight transfer KeyError by using HF model directly (not TransformersWrapper)
- Fix prompt_logprobs length mismatch in _RequestOutput_tc for V1 engine
- Auto-propagate WANDB_API_KEY, HF_TOKEN, HF_HOME to Ray workers
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ghstack-source-id: f6fb817
Pull-Request: #3556
ghstack-source-id: 60cb4b4
Pull Request resolved: #3574
Stack from ghstack (oldest at bottom):
with vLLM's native WeightTransferConfig/NCCLWeightTransferEngine API
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com