Skip to content

[Performance] Add update_traj_ids flag to Collector to skip trajectory tracking#3563

Closed
vmoens wants to merge 2 commits intogh/vmoens/244/basefrom
gh/vmoens/244/head
Closed

[Performance] Add update_traj_ids flag to Collector to skip trajectory tracking#3563
vmoens wants to merge 2 commits intogh/vmoens/244/basefrom
gh/vmoens/244/head

Conversation

@vmoens
Copy link
Copy Markdown
Collaborator

@vmoens vmoens commented Mar 23, 2026

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Mar 23, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3563

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

⏳ No Failures, 15 Pending

As of commit 1317477 with merge base a4301ee (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 23, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}10$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 83.0290μs 81.2314μs 12.3105 KOps/s 11.5978 KOps/s $\textbf{\color{#35bf28}+6.14\%}$
test_tensor_to_bytestream_speed[torch.save] 0.1434ms 0.1419ms 7.0492 KOps/s 6.7154 KOps/s $\color{#35bf28}+4.97\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1058s 0.1057s 9.4642 Ops/s 9.3840 Ops/s $\color{#35bf28}+0.85\%$
test_tensor_to_bytestream_speed[numpy] 2.4227μs 2.4181μs 413.5438 KOps/s 403.4938 KOps/s $\color{#35bf28}+2.49\%$
test_tensor_to_bytestream_speed[safetensors] 39.6541μs 39.4588μs 25.3429 KOps/s 25.4339 KOps/s $\color{#d91a1a}-0.36\%$
test_simple 0.7819s 0.7802s 1.2817 Ops/s 1.2279 Ops/s $\color{#35bf28}+4.38\%$
test_transformed 1.3809s 1.3781s 0.7256 Ops/s 0.7112 Ops/s $\color{#35bf28}+2.02\%$
test_serial 2.2881s 2.2858s 0.4375 Ops/s 0.4368 Ops/s $\color{#35bf28}+0.16\%$
test_parallel 1.9061s 1.8068s 0.5535 Ops/s 0.5604 Ops/s $\color{#d91a1a}-1.24\%$
test_step_mdp_speed[True-True-True-True-True] 0.3438ms 42.5823μs 23.4839 KOps/s 24.1821 KOps/s $\color{#d91a1a}-2.89\%$
test_step_mdp_speed[True-True-True-True-False] 73.5010μs 22.0939μs 45.2613 KOps/s 43.6370 KOps/s $\color{#35bf28}+3.72\%$
test_step_mdp_speed[True-True-True-False-True] 54.7310μs 23.0967μs 43.2962 KOps/s 42.1645 KOps/s $\color{#35bf28}+2.68\%$
test_step_mdp_speed[True-True-True-False-False] 36.3610μs 12.6055μs 79.3308 KOps/s 77.7843 KOps/s $\color{#35bf28}+1.99\%$
test_step_mdp_speed[True-True-False-True-True] 0.1131ms 44.4021μs 22.5215 KOps/s 22.3196 KOps/s $\color{#35bf28}+0.90\%$
test_step_mdp_speed[True-True-False-True-False] 54.0810μs 25.0713μs 39.8863 KOps/s 39.2287 KOps/s $\color{#35bf28}+1.68\%$
test_step_mdp_speed[True-True-False-False-True] 64.1210μs 25.7177μs 38.8837 KOps/s 37.4802 KOps/s $\color{#35bf28}+3.74\%$
test_step_mdp_speed[True-True-False-False-False] 41.0010μs 15.3427μs 65.1775 KOps/s 63.8077 KOps/s $\color{#35bf28}+2.15\%$
test_step_mdp_speed[True-False-True-True-True] 86.4610μs 46.2701μs 21.6122 KOps/s 21.0385 KOps/s $\color{#35bf28}+2.73\%$
test_step_mdp_speed[True-False-True-True-False] 57.3800μs 27.9855μs 35.7327 KOps/s 35.4236 KOps/s $\color{#35bf28}+0.87\%$
test_step_mdp_speed[True-False-True-False-True] 97.1110μs 26.2907μs 38.0363 KOps/s 37.4895 KOps/s $\color{#35bf28}+1.46\%$
test_step_mdp_speed[True-False-True-False-False] 49.0510μs 15.2466μs 65.5884 KOps/s 63.8644 KOps/s $\color{#35bf28}+2.70\%$
test_step_mdp_speed[True-False-False-True-True] 77.1020μs 49.6718μs 20.1322 KOps/s 19.8029 KOps/s $\color{#35bf28}+1.66\%$
test_step_mdp_speed[True-False-False-True-False] 61.4410μs 30.2361μs 33.0730 KOps/s 32.5442 KOps/s $\color{#35bf28}+1.62\%$
test_step_mdp_speed[True-False-False-False-True] 65.6910μs 28.5728μs 34.9983 KOps/s 34.3716 KOps/s $\color{#35bf28}+1.82\%$
test_step_mdp_speed[True-False-False-False-False] 92.8720μs 17.6203μs 56.7526 KOps/s 55.2797 KOps/s $\color{#35bf28}+2.66\%$
test_step_mdp_speed[False-True-True-True-True] 80.1520μs 46.6493μs 21.4366 KOps/s 21.0752 KOps/s $\color{#35bf28}+1.71\%$
test_step_mdp_speed[False-True-True-True-False] 57.1610μs 27.8026μs 35.9679 KOps/s 35.5298 KOps/s $\color{#35bf28}+1.23\%$
test_step_mdp_speed[False-True-True-False-True] 2.6433ms 29.9161μs 33.4269 KOps/s 32.4381 KOps/s $\color{#35bf28}+3.05\%$
test_step_mdp_speed[False-True-True-False-False] 43.8610μs 16.8968μs 59.1827 KOps/s 58.9677 KOps/s $\color{#35bf28}+0.36\%$
test_step_mdp_speed[False-True-False-True-True] 83.1610μs 49.4276μs 20.2316 KOps/s 20.2474 KOps/s $\color{#d91a1a}-0.08\%$
test_step_mdp_speed[False-True-False-True-False] 0.1032ms 30.3077μs 32.9950 KOps/s 32.6951 KOps/s $\color{#35bf28}+0.92\%$
test_step_mdp_speed[False-True-False-False-True] 58.4110μs 31.9915μs 31.2583 KOps/s 30.3529 KOps/s $\color{#35bf28}+2.98\%$
test_step_mdp_speed[False-True-False-False-False] 50.4010μs 19.0894μs 52.3851 KOps/s 50.7514 KOps/s $\color{#35bf28}+3.22\%$
test_step_mdp_speed[False-False-True-True-True] 85.9510μs 51.9530μs 19.2482 KOps/s 19.1529 KOps/s $\color{#35bf28}+0.50\%$
test_step_mdp_speed[False-False-True-True-False] 72.9010μs 32.7015μs 30.5796 KOps/s 30.0178 KOps/s $\color{#35bf28}+1.87\%$
test_step_mdp_speed[False-False-True-False-True] 0.1032ms 31.9979μs 31.2520 KOps/s 30.4034 KOps/s $\color{#35bf28}+2.79\%$
test_step_mdp_speed[False-False-True-False-False] 40.8510μs 19.2198μs 52.0297 KOps/s 51.2979 KOps/s $\color{#35bf28}+1.43\%$
test_step_mdp_speed[False-False-False-True-True] 97.2620μs 52.8367μs 18.9262 KOps/s 18.2182 KOps/s $\color{#35bf28}+3.89\%$
test_step_mdp_speed[False-False-False-True-False] 64.4010μs 35.1402μs 28.4575 KOps/s 27.7859 KOps/s $\color{#35bf28}+2.42\%$
test_step_mdp_speed[False-False-False-False-True] 73.9410μs 33.1683μs 30.1493 KOps/s 29.0941 KOps/s $\color{#35bf28}+3.63\%$
test_step_mdp_speed[False-False-False-False-False] 47.1400μs 21.8370μs 45.7938 KOps/s 45.2733 KOps/s $\color{#35bf28}+1.15\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.7221s 0.7184s 1.3921 Ops/s 1.3347 Ops/s $\color{#35bf28}+4.30\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7191s 0.6097s 1.6401 Ops/s 1.6616 Ops/s $\color{#d91a1a}-1.29\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7373s 1.6508s 0.6058 Ops/s 0.6019 Ops/s $\color{#35bf28}+0.64\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5149s 1.4261s 0.7012 Ops/s 0.6981 Ops/s $\color{#35bf28}+0.45\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9896s 1.8988s 0.5266 Ops/s 0.5242 Ops/s $\color{#35bf28}+0.47\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7665s 1.6730s 0.5977 Ops/s 0.5960 Ops/s $\color{#35bf28}+0.30\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.6849s 4.5613s 0.2192 Ops/s 0.2169 Ops/s $\color{#35bf28}+1.06\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.4027s 4.3638s 0.2292 Ops/s 0.2253 Ops/s $\color{#35bf28}+1.71\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9441s 1.8546s 0.5392 Ops/s 0.5343 Ops/s $\color{#35bf28}+0.92\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.7047s 1.6141s 0.6196 Ops/s 0.6315 Ops/s $\color{#d91a1a}-1.89\%$
test_values[generalized_advantage_estimate-True-True] 20.2469ms 19.5414ms 51.1733 Ops/s 50.8369 Ops/s $\color{#35bf28}+0.66\%$
test_values[vec_generalized_advantage_estimate-True-True] 0.1408s 3.7305ms 268.0587 Ops/s 265.3755 Ops/s $\color{#35bf28}+1.01\%$
test_values[td0_return_estimate-False-False] 0.1053ms 81.3694μs 12.2896 KOps/s 12.3919 KOps/s $\color{#d91a1a}-0.82\%$
test_values[td1_return_estimate-False-False] 47.4527ms 46.7408ms 21.3946 Ops/s 21.3234 Ops/s $\color{#35bf28}+0.33\%$
test_values[vec_td1_return_estimate-False-False] 1.2965ms 1.0777ms 927.9395 Ops/s 931.0332 Ops/s $\color{#d91a1a}-0.33\%$
test_values[td_lambda_return_estimate-True-False] 77.9174ms 76.5698ms 13.0600 Ops/s 13.0981 Ops/s $\color{#d91a1a}-0.29\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.2943ms 1.0680ms 936.3496 Ops/s 939.0406 Ops/s $\color{#d91a1a}-0.29\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 20.0089ms 19.7196ms 50.7110 Ops/s 51.0546 Ops/s $\color{#d91a1a}-0.67\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0267ms 0.7414ms 1.3488 KOps/s 1.3525 KOps/s $\color{#d91a1a}-0.27\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.8128ms 0.6649ms 1.5039 KOps/s 1.5126 KOps/s $\color{#d91a1a}-0.58\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5373ms 1.4798ms 675.7520 Ops/s 676.2889 Ops/s $\color{#d91a1a}-0.08\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7487ms 0.6746ms 1.4824 KOps/s 1.4697 KOps/s $\color{#35bf28}+0.86\%$
test_dqn_speed[False-None] 1.6951ms 1.5722ms 636.0666 Ops/s 634.6159 Ops/s $\color{#35bf28}+0.23\%$
test_dqn_speed[False-backward] 2.8352ms 2.2333ms 447.7706 Ops/s 448.0437 Ops/s $\color{#d91a1a}-0.06\%$
test_dqn_speed[True-None] 0.6637ms 0.5981ms 1.6720 KOps/s 1.6544 KOps/s $\color{#35bf28}+1.06\%$
test_dqn_speed[True-backward] 1.3146ms 1.1818ms 846.1540 Ops/s 774.5195 Ops/s $\textbf{\color{#35bf28}+9.25\%}$
test_dqn_speed[reduce-overhead-None] 0.7179ms 0.6226ms 1.6060 KOps/s 1.5854 KOps/s $\color{#35bf28}+1.30\%$
test_ddpg_speed[False-None] 3.4947ms 3.0205ms 331.0677 Ops/s 331.5251 Ops/s $\color{#d91a1a}-0.14\%$
test_ddpg_speed[False-backward] 4.4151ms 4.2716ms 234.1034 Ops/s 227.5649 Ops/s $\color{#35bf28}+2.87\%$
test_ddpg_speed[True-None] 1.5297ms 1.3970ms 715.8387 Ops/s 718.3089 Ops/s $\color{#d91a1a}-0.34\%$
test_ddpg_speed[True-backward] 2.6411ms 2.5784ms 387.8304 Ops/s 389.0812 Ops/s $\color{#d91a1a}-0.32\%$
test_ddpg_speed[reduce-overhead-None] 1.5073ms 1.3749ms 727.3218 Ops/s 723.0816 Ops/s $\color{#35bf28}+0.59\%$
test_sac_speed[False-None] 8.8855ms 8.4654ms 118.1274 Ops/s 117.8553 Ops/s $\color{#35bf28}+0.23\%$
test_sac_speed[False-backward] 12.2717ms 11.7392ms 85.1846 Ops/s 85.2369 Ops/s $\color{#d91a1a}-0.06\%$
test_sac_speed[True-None] 2.0021ms 1.9272ms 518.8757 Ops/s 514.4243 Ops/s $\color{#35bf28}+0.87\%$
test_sac_speed[True-backward] 4.1656ms 3.7484ms 266.7788 Ops/s 265.2741 Ops/s $\color{#35bf28}+0.57\%$
test_sac_speed[reduce-overhead-None] 16.6766ms 10.1453ms 98.5675 Ops/s 98.0298 Ops/s $\color{#35bf28}+0.55\%$
test_redq_deprec_speed[False-None] 10.4420ms 9.4642ms 105.6608 Ops/s 104.1785 Ops/s $\color{#35bf28}+1.42\%$
test_redq_deprec_speed[False-backward] 13.2131ms 12.7611ms 78.3631 Ops/s 76.9946 Ops/s $\color{#35bf28}+1.78\%$
test_redq_deprec_speed[True-None] 2.7690ms 2.6781ms 373.4053 Ops/s 359.3796 Ops/s $\color{#35bf28}+3.90\%$
test_redq_deprec_speed[True-backward] 4.8607ms 4.4113ms 226.6929 Ops/s 220.4654 Ops/s $\color{#35bf28}+2.82\%$
test_redq_deprec_speed[reduce-overhead-None] 14.7284ms 9.7340ms 102.7331 Ops/s 102.7425 Ops/s $-0.01\%$
test_td3_speed[False-None] 8.4092ms 8.3267ms 120.0951 Ops/s 119.9183 Ops/s $\color{#35bf28}+0.15\%$
test_td3_speed[False-backward] 11.4788ms 10.9471ms 91.3481 Ops/s 91.3558 Ops/s $-0.01\%$
test_td3_speed[True-None] 1.7228ms 1.6951ms 589.9475 Ops/s 579.2491 Ops/s $\color{#35bf28}+1.85\%$
test_td3_speed[True-backward] 3.3103ms 3.2561ms 307.1183 Ops/s 305.5890 Ops/s $\color{#35bf28}+0.50\%$
test_td3_speed[reduce-overhead-None] 50.4983ms 25.7654ms 38.8117 Ops/s 38.7629 Ops/s $\color{#35bf28}+0.13\%$
test_cql_speed[False-None] 18.1152ms 17.5346ms 57.0301 Ops/s 55.9000 Ops/s $\color{#35bf28}+2.02\%$
test_cql_speed[False-backward] 23.2672ms 22.7459ms 43.9640 Ops/s 42.7662 Ops/s $\color{#35bf28}+2.80\%$
test_cql_speed[True-None] 3.5259ms 3.4337ms 291.2338 Ops/s 289.6147 Ops/s $\color{#35bf28}+0.56\%$
test_cql_speed[True-backward] 5.9625ms 5.6188ms 177.9734 Ops/s 176.5604 Ops/s $\color{#35bf28}+0.80\%$
test_cql_speed[reduce-overhead-None] 19.1264ms 12.1443ms 82.3429 Ops/s 82.4652 Ops/s $\color{#d91a1a}-0.15\%$
test_a2c_speed[False-None] 3.4777ms 3.3068ms 302.4061 Ops/s 299.5884 Ops/s $\color{#35bf28}+0.94\%$
test_a2c_speed[False-backward] 6.8605ms 6.2587ms 159.7786 Ops/s 154.5978 Ops/s $\color{#35bf28}+3.35\%$
test_a2c_speed[True-None] 1.7077ms 1.4800ms 675.6558 Ops/s 687.6258 Ops/s $\color{#d91a1a}-1.74\%$
test_a2c_speed[True-backward] 3.2411ms 3.1595ms 316.5019 Ops/s 312.6175 Ops/s $\color{#35bf28}+1.24\%$
test_a2c_speed[reduce-overhead-None] 1.2028ms 1.0994ms 909.5778 Ops/s 883.7705 Ops/s $\color{#35bf28}+2.92\%$
test_ppo_speed[False-None] 4.0631ms 3.9368ms 254.0130 Ops/s 240.4754 Ops/s $\textbf{\color{#35bf28}+5.63\%}$
test_ppo_speed[False-backward] 7.5523ms 7.0819ms 141.2055 Ops/s 138.4099 Ops/s $\color{#35bf28}+2.02\%$
test_ppo_speed[True-None] 1.6868ms 1.5946ms 627.1124 Ops/s 615.8518 Ops/s $\color{#35bf28}+1.83\%$
test_ppo_speed[True-backward] 3.7781ms 3.2885ms 304.0927 Ops/s 297.5677 Ops/s $\color{#35bf28}+2.19\%$
test_ppo_speed[reduce-overhead-None] 1.2768ms 1.1617ms 860.8251 Ops/s 830.0833 Ops/s $\color{#35bf28}+3.70\%$
test_reinforce_speed[False-None] 2.5072ms 2.3855ms 419.1989 Ops/s 418.0768 Ops/s $\color{#35bf28}+0.27\%$
test_reinforce_speed[False-backward] 3.9094ms 3.4253ms 291.9422 Ops/s 281.5841 Ops/s $\color{#35bf28}+3.68\%$
test_reinforce_speed[True-None] 1.5163ms 1.4500ms 689.6478 Ops/s 686.6078 Ops/s $\color{#35bf28}+0.44\%$
test_reinforce_speed[True-backward] 3.2191ms 3.0936ms 323.2476 Ops/s 297.7764 Ops/s $\textbf{\color{#35bf28}+8.55\%}$
test_reinforce_speed[reduce-overhead-None] 0.6980s 10.4456ms 95.7341 Ops/s 111.5300 Ops/s $\textbf{\color{#d91a1a}-14.16\%}$
test_iql_speed[False-None] 9.8847ms 9.6564ms 103.5578 Ops/s 102.0789 Ops/s $\color{#35bf28}+1.45\%$
test_iql_speed[False-backward] 13.8696ms 13.4105ms 74.5684 Ops/s 72.4922 Ops/s $\color{#35bf28}+2.86\%$
test_iql_speed[True-None] 2.4942ms 2.3080ms 433.2817 Ops/s 426.4689 Ops/s $\color{#35bf28}+1.60\%$
test_iql_speed[True-backward] 5.0513ms 4.8651ms 205.5443 Ops/s 196.2709 Ops/s $\color{#35bf28}+4.72\%$
test_iql_speed[reduce-overhead-None] 17.1406ms 10.1665ms 98.3620 Ops/s 98.7406 Ops/s $\color{#d91a1a}-0.38\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.6066ms 6.1213ms 163.3637 Ops/s 164.2252 Ops/s $\color{#d91a1a}-0.52\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8568ms 0.3721ms 2.6873 KOps/s 2.8378 KOps/s $\textbf{\color{#d91a1a}-5.30\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6076ms 0.3575ms 2.7975 KOps/s 2.5207 KOps/s $\textbf{\color{#35bf28}+10.98\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.2114ms 5.9129ms 169.1213 Ops/s 168.5950 Ops/s $\color{#35bf28}+0.31\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.2227ms 0.3675ms 2.7213 KOps/s 2.5220 KOps/s $\textbf{\color{#35bf28}+7.90\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6540ms 0.3542ms 2.8229 KOps/s 2.7785 KOps/s $\color{#35bf28}+1.60\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.7387ms 1.4129ms 707.7842 Ops/s 675.2286 Ops/s $\color{#35bf28}+4.82\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.5100ms 1.3042ms 766.7606 Ops/s 727.8068 Ops/s $\textbf{\color{#35bf28}+5.35\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 10.5466ms 6.2059ms 161.1372 Ops/s 163.5691 Ops/s $\color{#d91a1a}-1.49\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.0857ms 0.5369ms 1.8624 KOps/s 1.9288 KOps/s $\color{#d91a1a}-3.44\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7559ms 0.5081ms 1.9681 KOps/s 1.9448 KOps/s $\color{#35bf28}+1.20\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.0139ms 5.8830ms 169.9819 Ops/s 167.6349 Ops/s $\color{#35bf28}+1.40\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.9950ms 0.3965ms 2.5219 KOps/s 2.8552 KOps/s $\textbf{\color{#d91a1a}-11.67\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5130ms 0.2846ms 3.5139 KOps/s 3.1801 KOps/s $\textbf{\color{#35bf28}+10.50\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1907ms 5.8909ms 169.7521 Ops/s 168.7884 Ops/s $\color{#35bf28}+0.57\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.7782ms 0.3696ms 2.7058 KOps/s 2.6734 KOps/s $\color{#35bf28}+1.21\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6316ms 0.3610ms 2.7699 KOps/s 3.0633 KOps/s $\textbf{\color{#d91a1a}-9.58\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.4973ms 5.9877ms 167.0100 Ops/s 165.2552 Ops/s $\color{#35bf28}+1.06\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.3426ms 0.5298ms 1.8877 KOps/s 2.0956 KOps/s $\textbf{\color{#d91a1a}-9.92\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.9777s 1.8502ms 540.4864 Ops/s 1.9052 KOps/s $\textbf{\color{#d91a1a}-71.63\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.7644ms 5.1523ms 194.0875 Ops/s 33.9872 Ops/s $\textbf{\color{#35bf28}+471.06\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 9.8799ms 2.0090ms 497.7604 Ops/s 529.0670 Ops/s $\textbf{\color{#d91a1a}-5.92\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 3.4871ms 1.0612ms 942.3514 Ops/s 988.8198 Ops/s $\color{#d91a1a}-4.70\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 7.8739ms 5.1278ms 195.0147 Ops/s 191.3074 Ops/s $\color{#35bf28}+1.94\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 3.9273ms 1.8939ms 528.0201 Ops/s 482.1805 Ops/s $\textbf{\color{#35bf28}+9.51\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 1.2206ms 1.0024ms 997.6408 Ops/s 1.0274 KOps/s $\color{#d91a1a}-2.89\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.6983s 19.2586ms 51.9247 Ops/s 186.0279 Ops/s $\textbf{\color{#d91a1a}-72.09\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 3.6263ms 1.9704ms 507.5041 Ops/s 500.8341 Ops/s $\color{#35bf28}+1.33\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 2.9665ms 1.2064ms 828.9389 Ops/s 823.1491 Ops/s $\color{#35bf28}+0.70\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 42.8325ms 39.4050ms 25.3775 Ops/s 25.0772 Ops/s $\color{#35bf28}+1.20\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 20.1323ms 18.3299ms 54.5556 Ops/s 53.0127 Ops/s $\color{#35bf28}+2.91\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 44.0379ms 40.6556ms 24.5969 Ops/s 24.0447 Ops/s $\color{#35bf28}+2.30\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.4214ms 18.6748ms 53.5480 Ops/s 52.0512 Ops/s $\color{#35bf28}+2.88\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 43.7140ms 42.1237ms 23.7396 Ops/s 23.1231 Ops/s $\color{#35bf28}+2.67\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.0797ms 20.0263ms 49.9343 Ops/s 48.8208 Ops/s $\color{#35bf28}+2.28\%$
test_storage_write_lazystack[50-img_shape0-small] 0.9313ms 0.2287ms 4.3732 KOps/s 4.4014 KOps/s $\color{#d91a1a}-0.64\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.8273ms 1.5257ms 655.4552 Ops/s 653.7575 Ops/s $\color{#35bf28}+0.26\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.7303ms 2.4593ms 406.6274 Ops/s 401.5854 Ops/s $\color{#35bf28}+1.26\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.5537ms 3.1507ms 317.3883 Ops/s 308.3465 Ops/s $\color{#35bf28}+2.93\%$
test_storage_write_contiguous[50-img_shape0-small] 0.2290ms 0.1622ms 6.1642 KOps/s 6.0195 KOps/s $\color{#35bf28}+2.40\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3828ms 0.2428ms 4.1192 KOps/s 3.9440 KOps/s $\color{#35bf28}+4.44\%$
test_storage_write_contiguous[100-img_shape2-large_img] 2.3817ms 1.9718ms 507.1400 Ops/s 500.4392 Ops/s $\color{#35bf28}+1.34\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.6929ms 1.4573ms 686.2070 Ops/s 719.5970 Ops/s $\color{#d91a1a}-4.64\%$
test_collector_stack_then_write[50-img_shape0-small] 1.2993ms 1.1688ms 855.5600 Ops/s 852.3212 Ops/s $\color{#35bf28}+0.38\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.9184ms 3.7367ms 267.6145 Ops/s 264.0894 Ops/s $\color{#35bf28}+1.33\%$
test_collector_stack_then_write[100-img_shape2-large_img] 6.4899ms 6.1110ms 163.6389 Ops/s 158.8952 Ops/s $\color{#35bf28}+2.99\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 8.1441ms 7.7712ms 128.6800 Ops/s 130.0831 Ops/s $\color{#d91a1a}-1.08\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4515ms 0.2801ms 3.5699 KOps/s 3.4663 KOps/s $\color{#35bf28}+2.99\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.9442ms 1.6540ms 604.5969 Ops/s 591.7212 Ops/s $\color{#35bf28}+2.18\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.8786ms 2.5917ms 385.8398 Ops/s 381.0367 Ops/s $\color{#35bf28}+1.26\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.7916ms 3.3557ms 298.0035 Ops/s 295.3272 Ops/s $\color{#35bf28}+0.91\%$
test_collector_without_rb[100-img_shape0-atari] 34.4076ms 33.2130ms 30.1087 Ops/s 29.0621 Ops/s $\color{#35bf28}+3.60\%$
test_collector_without_rb[200-img_shape1-large_batch] 67.3155ms 65.7924ms 15.1993 Ops/s 15.0007 Ops/s $\color{#35bf28}+1.32\%$
test_collector_with_rb[100-img_shape0-atari] 38.8916ms 38.0746ms 26.2642 Ops/s 25.9841 Ops/s $\color{#35bf28}+1.08\%$
test_collector_with_rb[200-img_shape1-large_batch] 76.8340ms 74.9573ms 13.3409 Ops/s 13.2950 Ops/s $\color{#35bf28}+0.35\%$
test_collector_without_rb_cuda[100-img_shape0-atari] 57.6379ms 55.1376ms 18.1365 Ops/s 17.8819 Ops/s $\color{#35bf28}+1.42\%$
test_collector_without_rb_cuda[200-img_shape1-large_batch] 0.1131s 0.1117s 8.9542 Ops/s 9.0783 Ops/s $\color{#d91a1a}-1.37\%$
test_collector_with_rb_cuda[100-img_shape0-atari] 59.0394ms 56.9550ms 17.5577 Ops/s 17.5085 Ops/s $\color{#35bf28}+0.28\%$
test_collector_with_rb_cuda[200-img_shape1-large_batch] 0.1175s 0.1137s 8.7954 Ops/s 8.7435 Ops/s $\color{#35bf28}+0.59\%$

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 23, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}21$. Worsened: $\large\color{#d91a1a}13$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 85.2427μs 82.2085μs 12.1642 KOps/s 12.3717 KOps/s $\color{#d91a1a}-1.68\%$
test_tensor_to_bytestream_speed[torch.save] 0.1429ms 0.1416ms 7.0604 KOps/s 7.0273 KOps/s $\color{#35bf28}+0.47\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1053s 0.1051s 9.5186 Ops/s 9.3298 Ops/s $\color{#35bf28}+2.02\%$
test_tensor_to_bytestream_speed[numpy] 2.4162μs 2.4087μs 415.1595 KOps/s 412.2816 KOps/s $\color{#35bf28}+0.70\%$
test_tensor_to_bytestream_speed[safetensors] 37.9376μs 36.7936μs 27.1786 KOps/s 26.4256 KOps/s $\color{#35bf28}+2.85\%$
test_simple 0.6657s 0.5668s 1.7642 Ops/s 1.7526 Ops/s $\color{#35bf28}+0.67\%$
test_transformed 1.0852s 1.0842s 0.9223 Ops/s 0.8945 Ops/s $\color{#35bf28}+3.12\%$
test_serial 1.6842s 1.6792s 0.5955 Ops/s 0.5789 Ops/s $\color{#35bf28}+2.87\%$
test_parallel 1.0092s 1.0075s 0.9926 Ops/s 0.9591 Ops/s $\color{#35bf28}+3.49\%$
test_step_mdp_speed[True-True-True-True-True] 0.1709ms 41.4003μs 24.1544 KOps/s 23.8524 KOps/s $\color{#35bf28}+1.27\%$
test_step_mdp_speed[True-True-True-True-False] 55.2410μs 23.5403μs 42.4804 KOps/s 43.6812 KOps/s $\color{#d91a1a}-2.75\%$
test_step_mdp_speed[True-True-True-False-True] 83.2220μs 24.4415μs 40.9139 KOps/s 42.9749 KOps/s $\color{#d91a1a}-4.80\%$
test_step_mdp_speed[True-True-True-False-False] 36.8000μs 13.3273μs 75.0341 KOps/s 79.1956 KOps/s $\textbf{\color{#d91a1a}-5.25\%}$
test_step_mdp_speed[True-True-False-True-True] 94.5020μs 45.3616μs 22.0451 KOps/s 22.7872 KOps/s $\color{#d91a1a}-3.26\%$
test_step_mdp_speed[True-True-False-True-False] 63.7210μs 26.4670μs 37.7830 KOps/s 39.9020 KOps/s $\textbf{\color{#d91a1a}-5.31\%}$
test_step_mdp_speed[True-True-False-False-True] 52.7210μs 27.0419μs 36.9796 KOps/s 38.1777 KOps/s $\color{#d91a1a}-3.14\%$
test_step_mdp_speed[True-True-False-False-False] 73.1510μs 15.8701μs 63.0114 KOps/s 66.0996 KOps/s $\color{#d91a1a}-4.67\%$
test_step_mdp_speed[True-False-True-True-True] 80.3120μs 48.2317μs 20.7333 KOps/s 20.9444 KOps/s $\color{#d91a1a}-1.01\%$
test_step_mdp_speed[True-False-True-True-False] 0.2167ms 28.5767μs 34.9936 KOps/s 36.1579 KOps/s $\color{#d91a1a}-3.22\%$
test_step_mdp_speed[True-False-True-False-True] 55.7010μs 26.7965μs 37.3183 KOps/s 38.4686 KOps/s $\color{#d91a1a}-2.99\%$
test_step_mdp_speed[True-False-True-False-False] 46.3410μs 15.6938μs 63.7195 KOps/s 66.3695 KOps/s $\color{#d91a1a}-3.99\%$
test_step_mdp_speed[True-False-False-True-True] 81.9110μs 49.5685μs 20.1741 KOps/s 20.5350 KOps/s $\color{#d91a1a}-1.76\%$
test_step_mdp_speed[True-False-False-True-False] 64.3210μs 31.0265μs 32.2306 KOps/s 32.9359 KOps/s $\color{#d91a1a}-2.14\%$
test_step_mdp_speed[True-False-False-False-True] 66.8610μs 28.5733μs 34.9977 KOps/s 35.3000 KOps/s $\color{#d91a1a}-0.86\%$
test_step_mdp_speed[True-False-False-False-False] 49.5110μs 17.9923μs 55.5794 KOps/s 56.7258 KOps/s $\color{#d91a1a}-2.02\%$
test_step_mdp_speed[False-True-True-True-True] 0.1285ms 46.6326μs 21.4442 KOps/s 21.4837 KOps/s $\color{#d91a1a}-0.18\%$
test_step_mdp_speed[False-True-True-True-False] 62.6110μs 28.1975μs 35.4641 KOps/s 35.8723 KOps/s $\color{#d91a1a}-1.14\%$
test_step_mdp_speed[False-True-True-False-True] 2.3285ms 31.2932μs 31.9558 KOps/s 32.8414 KOps/s $\color{#d91a1a}-2.70\%$
test_step_mdp_speed[False-True-True-False-False] 49.8010μs 17.2772μs 57.8799 KOps/s 57.5136 KOps/s $\color{#35bf28}+0.64\%$
test_step_mdp_speed[False-True-False-True-True] 93.4220μs 49.9537μs 20.0185 KOps/s 20.5104 KOps/s $\color{#d91a1a}-2.40\%$
test_step_mdp_speed[False-True-False-True-False] 91.1210μs 30.1960μs 33.1169 KOps/s 32.7182 KOps/s $\color{#35bf28}+1.22\%$
test_step_mdp_speed[False-True-False-False-True] 57.9410μs 32.4354μs 30.8305 KOps/s 31.2066 KOps/s $\color{#d91a1a}-1.21\%$
test_step_mdp_speed[False-True-False-False-False] 60.2410μs 19.7317μs 50.6800 KOps/s 52.4578 KOps/s $\color{#d91a1a}-3.39\%$
test_step_mdp_speed[False-False-True-True-True] 78.4310μs 52.5376μs 19.0340 KOps/s 19.6056 KOps/s $\color{#d91a1a}-2.92\%$
test_step_mdp_speed[False-False-True-True-False] 61.5310μs 33.3729μs 29.9644 KOps/s 30.4096 KOps/s $\color{#d91a1a}-1.46\%$
test_step_mdp_speed[False-False-True-False-True] 61.8610μs 32.7918μs 30.4954 KOps/s 30.9553 KOps/s $\color{#d91a1a}-1.49\%$
test_step_mdp_speed[False-False-True-False-False] 40.0300μs 19.7452μs 50.6453 KOps/s 52.3333 KOps/s $\color{#d91a1a}-3.23\%$
test_step_mdp_speed[False-False-False-True-True] 85.2120μs 54.0159μs 18.5131 KOps/s 18.7966 KOps/s $\color{#d91a1a}-1.51\%$
test_step_mdp_speed[False-False-False-True-False] 66.0910μs 35.5891μs 28.0985 KOps/s 28.3872 KOps/s $\color{#d91a1a}-1.02\%$
test_step_mdp_speed[False-False-False-False-True] 67.7420μs 34.5106μs 28.9766 KOps/s 29.2015 KOps/s $\color{#d91a1a}-0.77\%$
test_step_mdp_speed[False-False-False-False-False] 54.7110μs 22.1599μs 45.1266 KOps/s 46.2958 KOps/s $\color{#d91a1a}-2.53\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.7346s 0.7265s 1.3764 Ops/s 1.3430 Ops/s $\color{#35bf28}+2.49\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7133s 0.6140s 1.6288 Ops/s 1.6408 Ops/s $\color{#d91a1a}-0.73\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7316s 1.6526s 0.6051 Ops/s 0.6068 Ops/s $\color{#d91a1a}-0.28\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5210s 1.4342s 0.6973 Ops/s 0.7026 Ops/s $\color{#d91a1a}-0.77\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9854s 1.9030s 0.5255 Ops/s 0.5287 Ops/s $\color{#d91a1a}-0.60\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7680s 1.6873s 0.5926 Ops/s 0.6001 Ops/s $\color{#d91a1a}-1.24\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.7640s 4.5913s 0.2178 Ops/s 0.2169 Ops/s $\color{#35bf28}+0.42\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5375s 4.3960s 0.2275 Ops/s 0.2249 Ops/s $\color{#35bf28}+1.17\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9979s 1.8789s 0.5322 Ops/s 0.5382 Ops/s $\color{#d91a1a}-1.12\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.7872s 1.6592s 0.6027 Ops/s 0.6403 Ops/s $\textbf{\color{#d91a1a}-5.88\%}$
test_values[generalized_advantage_estimate-True-True] 10.2935ms 9.9975ms 100.0248 Ops/s 97.6622 Ops/s $\color{#35bf28}+2.42\%$
test_values[vec_generalized_advantage_estimate-True-True] 19.6729ms 17.2984ms 57.8088 Ops/s 62.2288 Ops/s $\textbf{\color{#d91a1a}-7.10\%}$
test_values[td0_return_estimate-False-False] 0.2160ms 0.1294ms 7.7263 KOps/s 8.1490 KOps/s $\textbf{\color{#d91a1a}-5.19\%}$
test_values[td1_return_estimate-False-False] 27.9527ms 27.2723ms 36.6673 Ops/s 36.3376 Ops/s $\color{#35bf28}+0.91\%$
test_values[vec_td1_return_estimate-False-False] 18.5339ms 17.4415ms 57.3344 Ops/s 58.9770 Ops/s $\color{#d91a1a}-2.79\%$
test_values[td_lambda_return_estimate-True-False] 41.4308ms 40.3665ms 24.7730 Ops/s 24.5364 Ops/s $\color{#35bf28}+0.96\%$
test_values[vec_td_lambda_return_estimate-True-False] 20.2126ms 17.6183ms 56.7593 Ops/s 65.4259 Ops/s $\textbf{\color{#d91a1a}-13.25\%}$
test_gae_speed[generalized_advantage_estimate-False-1-512] 8.8916ms 8.7750ms 113.9605 Ops/s 113.5130 Ops/s $\color{#35bf28}+0.39\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.9311ms 1.5017ms 665.9136 Ops/s 654.7649 Ops/s $\color{#35bf28}+1.70\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.8232ms 0.4136ms 2.4178 KOps/s 2.4040 KOps/s $\color{#35bf28}+0.58\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 35.0663ms 34.7136ms 28.8071 Ops/s 32.9965 Ops/s $\textbf{\color{#d91a1a}-12.70\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 1.8593ms 1.7028ms 587.2639 Ops/s 584.1203 Ops/s $\color{#35bf28}+0.54\%$
test_dqn_speed[False-None] 1.5104ms 1.4097ms 709.3493 Ops/s 706.6977 Ops/s $\color{#35bf28}+0.38\%$
test_dqn_speed[False-backward] 1.9917ms 1.9108ms 523.3281 Ops/s 514.1142 Ops/s $\color{#35bf28}+1.79\%$
test_dqn_speed[True-None] 0.9874ms 0.5653ms 1.7689 KOps/s 1.6997 KOps/s $\color{#35bf28}+4.08\%$
test_dqn_speed[True-backward] 1.0680ms 1.0239ms 976.6221 Ops/s 819.0531 Ops/s $\textbf{\color{#35bf28}+19.24\%}$
test_dqn_speed[reduce-overhead-None] 0.8353ms 0.5474ms 1.8267 KOps/s 1.7909 KOps/s $\color{#35bf28}+2.00\%$
test_ddpg_speed[False-None] 3.2952ms 2.8656ms 348.9644 Ops/s 355.6646 Ops/s $\color{#d91a1a}-1.88\%$
test_ddpg_speed[False-backward] 4.2339ms 4.0920ms 244.3772 Ops/s 247.6651 Ops/s $\color{#d91a1a}-1.33\%$
test_ddpg_speed[True-None] 1.8913ms 1.4664ms 681.9284 Ops/s 676.4293 Ops/s $\color{#35bf28}+0.81\%$
test_ddpg_speed[True-backward] 2.8415ms 2.4418ms 409.5377 Ops/s 407.7543 Ops/s $\color{#35bf28}+0.44\%$
test_ddpg_speed[reduce-overhead-None] 1.8173ms 1.4385ms 695.1601 Ops/s 578.4878 Ops/s $\textbf{\color{#35bf28}+20.17\%}$
test_sac_speed[False-None] 8.8642ms 8.1323ms 122.9666 Ops/s 105.3439 Ops/s $\textbf{\color{#35bf28}+16.73\%}$
test_sac_speed[False-backward] 11.9880ms 11.3402ms 88.1817 Ops/s 74.9235 Ops/s $\textbf{\color{#35bf28}+17.70\%}$
test_sac_speed[True-None] 2.6021ms 2.1814ms 458.4223 Ops/s 464.7406 Ops/s $\color{#d91a1a}-1.36\%$
test_sac_speed[True-backward] 4.1781ms 4.0508ms 246.8629 Ops/s 242.6627 Ops/s $\color{#35bf28}+1.73\%$
test_sac_speed[reduce-overhead-None] 2.4931ms 2.1656ms 461.7670 Ops/s 464.4234 Ops/s $\color{#d91a1a}-0.57\%$
test_redq_speed[False-None] 14.9759ms 10.6694ms 93.7261 Ops/s 94.0199 Ops/s $\color{#d91a1a}-0.31\%$
test_redq_speed[False-backward] 18.2679ms 17.2716ms 57.8985 Ops/s 54.6714 Ops/s $\textbf{\color{#35bf28}+5.90\%}$
test_redq_speed[True-None] 5.1206ms 4.5852ms 218.0932 Ops/s 212.0006 Ops/s $\color{#35bf28}+2.87\%$
test_redq_speed[reduce-overhead-None] 4.8038ms 4.5157ms 221.4487 Ops/s 221.7712 Ops/s $\color{#d91a1a}-0.15\%$
test_redq_deprec_speed[False-None] 12.2636ms 11.4563ms 87.2882 Ops/s 89.0475 Ops/s $\color{#d91a1a}-1.98\%$
test_redq_deprec_speed[False-backward] 16.6222ms 16.2814ms 61.4199 Ops/s 62.1992 Ops/s $\color{#d91a1a}-1.25\%$
test_redq_deprec_speed[True-None] 4.0211ms 3.7923ms 263.6912 Ops/s 275.5012 Ops/s $\color{#d91a1a}-4.29\%$
test_redq_deprec_speed[True-backward] 8.4627ms 7.9583ms 125.6547 Ops/s 135.8687 Ops/s $\textbf{\color{#d91a1a}-7.52\%}$
test_redq_deprec_speed[reduce-overhead-None] 4.1351ms 3.7113ms 269.4447 Ops/s 272.4325 Ops/s $\color{#d91a1a}-1.10\%$
test_td3_speed[False-None] 8.3879ms 8.1602ms 122.5466 Ops/s 121.2926 Ops/s $\color{#35bf28}+1.03\%$
test_td3_speed[False-backward] 11.3446ms 11.0557ms 90.4508 Ops/s 89.9384 Ops/s $\color{#35bf28}+0.57\%$
test_td3_speed[True-None] 2.2487ms 1.8191ms 549.7164 Ops/s 544.5668 Ops/s $\color{#35bf28}+0.95\%$
test_td3_speed[True-backward] 3.8061ms 3.5658ms 280.4424 Ops/s 281.5862 Ops/s $\color{#d91a1a}-0.41\%$
test_td3_speed[reduce-overhead-None] 1.8165ms 1.7821ms 561.1280 Ops/s 555.9850 Ops/s $\color{#35bf28}+0.93\%$
test_cql_speed[False-None] 28.8279ms 26.1393ms 38.2566 Ops/s 37.8644 Ops/s $\color{#35bf28}+1.04\%$
test_cql_speed[False-backward] 38.2573ms 35.3720ms 28.2710 Ops/s 27.7279 Ops/s $\color{#35bf28}+1.96\%$
test_cql_speed[True-None] 13.0451ms 12.4462ms 80.3461 Ops/s 79.7080 Ops/s $\color{#35bf28}+0.80\%$
test_cql_speed[True-backward] 18.1023ms 17.6141ms 56.7727 Ops/s 55.7814 Ops/s $\color{#35bf28}+1.78\%$
test_cql_speed[reduce-overhead-None] 12.7697ms 12.4430ms 80.3663 Ops/s 78.8713 Ops/s $\color{#35bf28}+1.90\%$
test_a2c_speed[False-None] 5.9292ms 5.5383ms 180.5615 Ops/s 188.8888 Ops/s $\color{#d91a1a}-4.41\%$
test_a2c_speed[False-backward] 12.6801ms 12.0096ms 83.2668 Ops/s 85.3793 Ops/s $\color{#d91a1a}-2.47\%$
test_a2c_speed[True-None] 4.2958ms 3.8810ms 257.6623 Ops/s 259.5773 Ops/s $\color{#d91a1a}-0.74\%$
test_a2c_speed[True-backward] 9.5643ms 8.7882ms 113.7895 Ops/s 112.4139 Ops/s $\color{#35bf28}+1.22\%$
test_a2c_speed[reduce-overhead-None] 4.3273ms 3.8691ms 258.4613 Ops/s 260.3302 Ops/s $\color{#d91a1a}-0.72\%$
test_ppo_speed[False-None] 6.4431ms 6.0009ms 166.6428 Ops/s 169.6622 Ops/s $\color{#d91a1a}-1.78\%$
test_ppo_speed[False-backward] 13.1880ms 12.6291ms 79.1820 Ops/s 80.4857 Ops/s $\color{#d91a1a}-1.62\%$
test_ppo_speed[True-None] 4.2894ms 3.8975ms 256.5755 Ops/s 263.6251 Ops/s $\color{#d91a1a}-2.67\%$
test_ppo_speed[True-backward] 9.2051ms 8.7928ms 113.7297 Ops/s 111.5086 Ops/s $\color{#35bf28}+1.99\%$
test_ppo_speed[reduce-overhead-None] 4.7848ms 3.8730ms 258.1970 Ops/s 268.6508 Ops/s $\color{#d91a1a}-3.89\%$
test_reinforce_speed[False-None] 5.0003ms 4.6886ms 213.2852 Ops/s 222.2066 Ops/s $\color{#d91a1a}-4.01\%$
test_reinforce_speed[False-backward] 7.8567ms 7.5792ms 131.9408 Ops/s 136.7678 Ops/s $\color{#d91a1a}-3.53\%$
test_reinforce_speed[True-None] 3.5563ms 3.1161ms 320.9120 Ops/s 324.4917 Ops/s $\color{#d91a1a}-1.10\%$
test_reinforce_speed[True-backward] 8.2405ms 7.9926ms 125.1161 Ops/s 125.7046 Ops/s $\color{#d91a1a}-0.47\%$
test_reinforce_speed[reduce-overhead-None] 3.2407ms 3.0497ms 327.9063 Ops/s 329.7634 Ops/s $\color{#d91a1a}-0.56\%$
test_iql_speed[False-None] 21.5341ms 20.2857ms 49.2958 Ops/s 49.9667 Ops/s $\color{#d91a1a}-1.34\%$
test_iql_speed[False-backward] 31.6584ms 30.5171ms 32.7685 Ops/s 32.9786 Ops/s $\color{#d91a1a}-0.64\%$
test_iql_speed[True-None] 8.8099ms 8.5605ms 116.8155 Ops/s 113.3488 Ops/s $\color{#35bf28}+3.06\%$
test_iql_speed[True-backward] 16.9182ms 16.5495ms 60.4250 Ops/s 57.8368 Ops/s $\color{#35bf28}+4.47\%$
test_iql_speed[reduce-overhead-None] 9.3907ms 8.6001ms 116.2780 Ops/s 115.7162 Ops/s $\color{#35bf28}+0.49\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.3156ms 6.0744ms 164.6245 Ops/s 162.2264 Ops/s $\color{#35bf28}+1.48\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 3.0521ms 0.3443ms 2.9047 KOps/s 3.1177 KOps/s $\textbf{\color{#d91a1a}-6.83\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5604ms 0.3155ms 3.1692 KOps/s 3.2456 KOps/s $\color{#d91a1a}-2.35\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1418ms 5.8507ms 170.9202 Ops/s 169.7920 Ops/s $\color{#35bf28}+0.66\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.1356ms 0.2851ms 3.5074 KOps/s 2.7379 KOps/s $\textbf{\color{#35bf28}+28.11\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5356ms 0.2668ms 3.7485 KOps/s 2.8647 KOps/s $\textbf{\color{#35bf28}+30.85\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.6513ms 1.2891ms 775.7408 Ops/s 673.2192 Ops/s $\textbf{\color{#35bf28}+15.23\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.5038ms 1.1903ms 840.1264 Ops/s 703.5594 Ops/s $\textbf{\color{#35bf28}+19.41\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 10.0372ms 6.0949ms 164.0724 Ops/s 165.8086 Ops/s $\color{#d91a1a}-1.05\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.7838ms 0.4590ms 2.1785 KOps/s 1.9263 KOps/s $\textbf{\color{#35bf28}+13.09\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7649ms 0.4460ms 2.2424 KOps/s 2.0100 KOps/s $\textbf{\color{#35bf28}+11.56\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.9140ms 5.7837ms 172.8993 Ops/s 170.9531 Ops/s $\color{#35bf28}+1.14\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.0256ms 0.2894ms 3.4553 KOps/s 2.7254 KOps/s $\textbf{\color{#35bf28}+26.78\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.4035ms 0.2737ms 3.6533 KOps/s 2.8500 KOps/s $\textbf{\color{#35bf28}+28.19\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.9881ms 5.7487ms 173.9534 Ops/s 170.8725 Ops/s $\color{#35bf28}+1.80\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.0894ms 0.2870ms 3.4841 KOps/s 2.8926 KOps/s $\textbf{\color{#35bf28}+20.45\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.4527ms 0.2665ms 3.7521 KOps/s 3.5690 KOps/s $\textbf{\color{#35bf28}+5.13\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.0846ms 5.9517ms 168.0191 Ops/s 167.1833 Ops/s $\color{#35bf28}+0.50\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.9438ms 0.4345ms 2.3014 KOps/s 2.0952 KOps/s $\textbf{\color{#35bf28}+9.84\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8738ms 0.4151ms 2.4092 KOps/s 2.3707 KOps/s $\color{#35bf28}+1.62\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.8382ms 5.3928ms 185.4329 Ops/s 48.6898 Ops/s $\textbf{\color{#35bf28}+280.85\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 8.9965ms 2.0870ms 479.1554 Ops/s 499.1841 Ops/s $\color{#d91a1a}-4.01\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 3.0226ms 0.9533ms 1.0490 KOps/s 810.7649 Ops/s $\textbf{\color{#35bf28}+29.38\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.6357s 18.1103ms 55.2172 Ops/s 197.9660 Ops/s $\textbf{\color{#d91a1a}-72.11\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 3.8714ms 1.7849ms 560.2621 Ops/s 493.5039 Ops/s $\textbf{\color{#35bf28}+13.53\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 1.0963ms 0.9052ms 1.1047 KOps/s 878.8885 Ops/s $\textbf{\color{#35bf28}+25.69\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 7.5871ms 5.6796ms 176.0672 Ops/s 186.8348 Ops/s $\textbf{\color{#d91a1a}-5.76\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 12.3977ms 2.1105ms 473.8295 Ops/s 520.3139 Ops/s $\textbf{\color{#d91a1a}-8.93\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 3.5645ms 1.0942ms 913.9060 Ops/s 946.6142 Ops/s $\color{#d91a1a}-3.46\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 43.6048ms 39.1375ms 25.5509 Ops/s 25.1072 Ops/s $\color{#35bf28}+1.77\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 20.0952ms 18.1868ms 54.9849 Ops/s 54.3933 Ops/s $\color{#35bf28}+1.09\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 46.7962ms 40.5418ms 24.6659 Ops/s 24.2877 Ops/s $\color{#35bf28}+1.56\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.9422ms 18.6126ms 53.7270 Ops/s 53.8051 Ops/s $\color{#d91a1a}-0.15\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 0.6047s 53.3388ms 18.7481 Ops/s 23.1505 Ops/s $\textbf{\color{#d91a1a}-19.02\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.3158ms 20.0208ms 49.9481 Ops/s 49.5325 Ops/s $\color{#35bf28}+0.84\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8429ms 0.2252ms 4.4402 KOps/s 4.3045 KOps/s $\color{#35bf28}+3.15\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.7160ms 1.3745ms 727.5208 Ops/s 723.8737 Ops/s $\color{#35bf28}+0.50\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.7665ms 2.3433ms 426.7532 Ops/s 413.6763 Ops/s $\color{#35bf28}+3.16\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.0759ms 2.9215ms 342.2842 Ops/s 344.2001 Ops/s $\color{#d91a1a}-0.56\%$
test_storage_write_contiguous[50-img_shape0-small] 0.2165ms 0.1355ms 7.3799 KOps/s 7.3881 KOps/s $\color{#d91a1a}-0.11\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3323ms 0.1863ms 5.3688 KOps/s 5.2667 KOps/s $\color{#35bf28}+1.94\%$
test_storage_write_contiguous[100-img_shape2-large_img] 1.8681ms 1.7277ms 578.8119 Ops/s 577.2615 Ops/s $\color{#35bf28}+0.27\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.5050ms 1.2738ms 785.0829 Ops/s 790.6905 Ops/s $\color{#d91a1a}-0.71\%$
test_collector_stack_then_write[50-img_shape0-small] 1.3009ms 1.1324ms 883.0872 Ops/s 879.4192 Ops/s $\color{#35bf28}+0.42\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.7308ms 3.6032ms 277.5316 Ops/s 280.9299 Ops/s $\color{#d91a1a}-1.21\%$
test_collector_stack_then_write[100-img_shape2-large_img] 10.1099ms 5.6146ms 178.1082 Ops/s 171.1837 Ops/s $\color{#35bf28}+4.05\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.1767ms 7.0096ms 142.6624 Ops/s 136.7414 Ops/s $\color{#35bf28}+4.33\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4467ms 0.2798ms 3.5744 KOps/s 3.5802 KOps/s $\color{#d91a1a}-0.16\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.6512ms 1.4998ms 666.7671 Ops/s 664.4927 Ops/s $\color{#35bf28}+0.34\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.7069ms 2.4161ms 413.8919 Ops/s 392.9332 Ops/s $\textbf{\color{#35bf28}+5.33\%}$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.3166ms 3.1337ms 319.1085 Ops/s 318.0200 Ops/s $\color{#35bf28}+0.34\%$
test_collector_without_rb[100-img_shape0-atari] 33.7659ms 33.2045ms 30.1164 Ops/s 30.0941 Ops/s $\color{#35bf28}+0.07\%$
test_collector_without_rb[200-img_shape1-large_batch] 66.0839ms 65.1112ms 15.3583 Ops/s 15.1992 Ops/s $\color{#35bf28}+1.05\%$
test_collector_with_rb[100-img_shape0-atari] 38.4754ms 37.9429ms 26.3554 Ops/s 26.0610 Ops/s $\color{#35bf28}+1.13\%$
test_collector_with_rb[200-img_shape1-large_batch] 75.0762ms 73.9186ms 13.5284 Ops/s 13.3261 Ops/s $\color{#35bf28}+1.52\%$

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this, this PR should be dropped entirely

[ghstack-poisoned]
@vmoens vmoens closed this Apr 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Collectors Performance Performance issue or suggestion for improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant