fix: allow wandb config value changes on resume#2137
fix: allow wandb config value changes on resume#2137yuki-97 merged 2 commits intoNVIDIA-NeMo:mainfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughUpdated hyperparameter logging in both Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can use Trivy to scan for security misconfigurations and secrets in Infrastructure as Code files.Add a .trivyignore file to your project to customize which findings Trivy reports. |
yuki-97
left a comment
There was a problem hiding this comment.
thanks @gkaplun-nvidia , lgtm
|
@gkaplun-nvidia it looks DCO check failed, could you follow the guide to fix it? |
terrykong
left a comment
There was a problem hiding this comment.
lgtm. @gkaplun-nvidia could you signoff the commits to pass the DCO check?
Head branch was pushed to by a user without write access
57b9da1 to
23a9cdf
Compare
|
@terrykong, done. |
|
/ok to test 23a9cdf |
|
/ok to test c6eac01 |
When resuming a wandb run, `config.update()` raises an error if a key already exists with a different value. This is common when hyperparameters are re-logged on resume. Adding `allow_val_change=True` prevents the crash while still updating the config. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Gal Kaplun <[email protected]>
c6eac01 to
58fb411
Compare
|
/ok to test 58fb411 |
|
hi @gkaplun-nvidia , there's some unit test fail in https://github.com/NVIDIA-NeMo/RL/actions/runs/23744949284/job/69172069492?pr=2137, could you help to fix? for the fail in fast_L1, it should be unrelated to this PR, we'll take a look. |
Update mock assertions in TestWandbLogger and TestSwanlabLogger to match the allow_val_change=True parameter added to config.update() calls in the logger implementation. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Gal Kaplun <[email protected]>
Head branch was pushed to by a user without write access
|
/ok to test 20faf7f |
Summary
Add
allow_val_change=Truetowandb.config.update()calls in bothWandbLoggerandMLflowWandbLogger.Problem
When resuming a wandb run,
config.update()raises an error if a hyperparameter key already exists with a different value. This is common when hyperparameters are re-logged on resume (e.g., if the config is passed again duringsetup()).Fix
Pass
allow_val_change=Trueso that config values can be overwritten without raising. This matches the expected behavior for resumed runs where the same config is re-applied.Test plan
Summary by CodeRabbit