Support Mixed precision & Static MSE PTQ in MCore export; Nemotron Super v3 NVFP4 recipe by jenchen13 · Pull Request #1363 · NVIDIA/Model-Optimizer

jenchen13 · 2026-04-28T18:33:04Z

What does this PR do?

Type of change: New recipe + Bug Fixes

Add a YAML quantization recipe that roughly mirrors Nemotron 3 Super NVFP4 hf_quant_config.json
Export bug fixes
support mixed precision export in MCore by detecting mixed precision layers in HF Quant Config
unify config.json["quant_config"] and hf_quant_config to be the same (matching new VLLM format)
copy .py files properly from original HF ckpt (for reasoning parser etc)
MCore and MSE fixes
Support MCore checkpoint resumed TensorQuantizer with _global_amax field in NVFP4QTensor static quantizer detection -> fixes bug during MCore export for MSE
Restore static quantizer in MCore checkpoint restore as NVFP4QTensor (not TensorQuantizer which can call max calibrate. we want to skip max calibrate for static quantizer during restore)
Fix dynamic block quantizer detection when block_sizes is dict-backed.
Skip dynamic block quantizers during MoE calibration completeness checks and distributed amax sync.

Super recipe

Mirrors the published nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 hf_quant_config.json:

MoE routed experts (mixer.experts..{up,down}_proj): NVFP4 W4A4 weight MSE, group_size 16
MoE shared experts (mixer.shared_experts.{up,down}_proj): FP8 per-tensor
Mamba mixer linears (mixer.{in,out}_proj): FP8 per-tensor
KV cache: FP8
rest: not quantized

Usage

# Add a code snippet demonstrating how to use this

Testing

TODO test in HF and MCore PTQ on Nemotron model

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅ / ❌ / N/A
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
Did you write any new necessary tests?: ✅ / ❌ / N/A
Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

New Features
- Added three new quantization recipes for Nemotron-3-Super-120B with NVFP4 and FP8 calibration strategies.
- Added configurable FP8 scale sweep stride parameter for fine-tuning quantization calibration.
- Improved per-layer quantization metadata collection during model export.
Improvements
- Enhanced mixed-precision quantization and MoE expert handling across distributed processes.
- Refined KV-cache and attention layer quantization configuration export.
Tests
- Updated quantization export verification tests.

coderabbitai · 2026-04-28T18:33:16Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR adds FP8 scale sweep stride control to calibration workflows, introduces three mixed-precision NVFP4 quantization recipes for Nemotron-3-Super-120B with different calibration methods (MSE, MSE with stride-4 sweep, max-based), refactors MoE calibration completeness checks to recursively traverse SequentialQuantizer leaves, and overhauls HuggingFace export to collect and apply per-layer quantization metadata.

Changes

Cohort / File(s)	Summary
YAML Quantization Recipes `modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml`, `modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-fp8-sweep-stride4.yaml`, `modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib.yaml`	Three new mixed-precision quantization recipes: MSE calibration with FP8 scale sweep (stride 1 and 4), and max-based calibration. Each selectively enables NVFP4 W4A4 for routed MoE experts, FP8 for shared experts/Mamba components, and KV-cache BMM quantizers.
FP8 Scale Sweep Stride Configuration `modelopt/torch/quantization/config.py`, `modelopt/torch/quantization/calib/mse.py`, `modelopt/torch/quantization/model_calib.py`	Adds `fp8_scale_sweep_stride` parameter to `MseCalibConfig`, `NVFP4MSECalibrator`, and `mse_calibrate()` function to subsample FP8 E4M3 scale candidates during calibration (minimum value 1).
MoE Calibration and Synchronization `modelopt/torch/quantization/model_calib.py`	Refactors MoE completeness checks and DP/EP `_amax` synchronization to recursively traverse `SequentialQuantizer` leaves, consistently detect dynamic block quantizers via shared helper, and skip dynamic-block leaves from synchronization.
NVFP4 Static Quantizer Detection `modelopt/torch/quantization/qtensor/nvfp4_tensor.py`	Introduces `_get_static_global_amax` helper supporting both dict- and attribute-style access to `weight_quantizer.global_amax`, replacing direct attribute lookups across scaling factor methods.
Megatron Plugin Sharding `modelopt/torch/quantization/plugins/megatron.py`	Filters `weight_quantizer._global_amax` entries from sharded axis mapping to exclude only global-amax tensors from axis assignment.
HF Export Layer Metadata `modelopt/torch/export/unified_export_megatron.py`	Adds per-layer quantization metadata collection during export, refactors exclusion handling via centralized `_record_excluded_module` helper, enhances QKV handling for unquantized cases with separate projection exclusions, and updates `save_pretrained` to apply per-layer quantization config when available.
Export Test Updates `tests/gpu_megatron/torch/export/test_unified_export_megatron.py`	Revises quantization config verification to require whole-structure equality, updates HF config validation paths (e.g., `config_groups/group_0/weights/group_size` for NVFP4 group size), and adds unit test for unquantized QKV slicing and derived projection module exclusions.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.68% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately captures the main focus: mixed-precision support, static MSE PTQ in MCore export, and a new Nemotron Super v3 NVFP4 recipe, all of which are reflected throughout the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	Pull request does not introduce any critical security anti-patterns defined in SECURITY.md.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch jennifchen/super_nvfp4_recipe

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml`:
- Around line 29-32: The metadata claims attention o_proj is FP8 per-tensor but
quant_cfg lacks any override for attention.o_proj; either update the description
or add the missing quantizer mapping. Fix by adding an explicit quant_cfg
override for the attention o_proj parameter name (e.g., attention.o_proj /
attention.output_projection / whatever key is used in your model mapping) to use
the FP8 per-tensor quantizer used elsewhere, or remove the attention o_proj
mention from the metadata so it matches the existing quant_cfg; ensure you
reference the exact layer key used in quant_cfg to keep mapping consistent with
the model.
- Around line 94-115: The entries using broad quantizer_name patterns
('*mixer.fc1_latent_proj*weight_quantizer',
'*mixer.fc1_latent_proj*input_quantizer',
'*mixer.fc2_latent_proj*weight_quantizer',
'*mixer.fc2_latent_proj*input_quantizer') are enabling FP8 for every latent
projection instead of only layers 1, 3, and 5; update these quantizer_name
values to target only the specific layer instances (e.g. include the layer
index/identifier for layers 1, 3, and 5 in the wildcard or use a regex/explicit
list) so only those mixer.fc1_latent_proj and mixer.fc2_latent_proj quantizers
are set to num_bits: e4m3, and leave all other latent projection quantizers at
BF16 (or remove the generic entries).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 60ed9c0b-efef-4967-b321-7270a8853455

📥 Commits

Reviewing files that changed from the base of the PR and between 8eec6d4 and 2a0c852.

📒 Files selected for processing (1)

modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml

codecov · 2026-04-28T18:50:07Z

Codecov Report

❌ Patch coverage is 73.84615% with 34 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.10%. Comparing base (8eec6d4) to head (6016020).
⚠️ Report is 39 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/torch/quantization/plugins/custom.py	41.17%	10 Missing ⚠️
modelopt/torch/export/unified_export_megatron.py	83.01%	9 Missing ⚠️
modelopt/torch/quantization/model_calib.py	62.50%	9 Missing ⚠️
modelopt/torch/quantization/plugins/megatron.py	66.66%	3 Missing ⚠️
...odelopt/torch/quantization/qtensor/nvfp4_tensor.py	83.33%	2 Missing ⚠️
...delopt/torch/export/plugins/hf_checkpoint_utils.py	93.33%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1363       +/-   ##
===========================================
- Coverage   76.93%   64.10%   -12.83%     
===========================================
  Files         471      477        +6     
  Lines       50404    55095     +4691     
===========================================
- Hits        38776    35320     -3456     
- Misses      11628    19775     +8147

Flag	Coverage Δ
examples	`41.56% <25.58%> (+0.89%)`	⬆️
gpu	`26.98% <68.21%> (-33.18%)`	⬇️
regression	`14.91% <10.07%> (+0.20%)`	⬆️
unit	`52.50% <24.21%> (-0.24%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2026-04-29T19:15:38Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1363/
Built to branch `gh-pages` at 2026-05-11 18:51 UTC. Preview will be ready when the GitHub Pages deployment is complete.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

modelopt/torch/quantization/model_calib.py (1)
389-412: ⚠️ Potential issue | 🟡 Minor

Custom FP8-sweep backends still ignore the new stride setting.

When a registered backend_factory is used, this branch still calls the old 3-argument factory signature, so fp8_scale_sweep_stride only takes effect on the built-in NVFP4MSECalibrator path below. That makes the new config silently no-op for registry-backed sweep calibrators. Please extend the factory contract or reject non-default stride explicitly in the backend path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/model_calib.py` around lines 389 - 412, The
registered backend factory path (lookup via _FP8_SWEEP_CALIBRATOR_REGISTRY and
assigned to backend_factory) currently calls the factory with the old 3-argument
signature and thus ignores fp8_scale_sweep_stride; update the branch that sets
module._calibrator via backend_factory to either (A) call the factory with the
new argument (pass fp8_scale_sweep_stride) and update the factory contract
accordingly, or (B) explicitly detect a non-default fp8_scale_sweep_stride and
raise/rollback with a clear error so users know registry-backed calibrators do
not support stride; ensure the call still passes initial_amax,
module._calibrator._axis, partial(_mse_quant_func, quantizer=module) and include
fp8_scale_sweep_stride when choosing option A, mirroring how NVFP4MSECalibrator
is constructed.

♻️ Duplicate comments (2)

modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml (1)
41-42: ⚠️ Potential issue | 🟡 Minor

The metadata description still overstates what this recipe quantizes.

These lines say attention o_proj, fc1_latent_proj, and fc2_latent_proj are FP8 per-tensor, but there are no matching overrides in quant_cfg, and the header comments say the latent MoE projections stay BF16. Please update the description so it matches the actual recipe.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml` around
lines 41 - 42, The metadata description in the YAML (the description field)
incorrectly claims that "attention o_proj", "fc1_latent_proj", and
"fc2_latent_proj" are FP8 per-tensor while the recipe does not contain matching
overrides in quant_cfg and header comments state latent MoE projections remain
BF16/FP16; update the description text to accurately reflect the recipe by
removing or changing the FP8 claims (e.g., state that latent MoE projections and
those specific projections remain BF16/FP16 and only list the layers that the
quant_cfg actually overrides as FP8 per-tensor).
modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-fp8-sweep-stride4.yaml (1)
34-35: ⚠️ Potential issue | 🟡 Minor

The metadata description does not match the actual quantizer mapping.

These lines still claim attention o_proj, fc1_latent_proj, and fc2_latent_proj are FP8 per-tensor, but this recipe never enables those quantizers, and the header comments above say latent MoE stays BF16. Please align the description with the quant_cfg that follows.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-fp8-sweep-stride4.yaml`
around lines 34 - 35, The YAML description string claims that "attention o_proj,
fc1_latent_proj, and fc2_latent_proj" are FP8 per-tensor, but the quant_cfg in
this recipe does not enable quantizers for "attention.o_proj",
"fc1_latent_proj", or "fc2_latent_proj" (and the header notes latent MoE stays
BF16/FP16); fix this by either updating the description to state those
projections remain BF16/FP16 (and remove the FP8 per-tensor claim) or modify
quant_cfg to actually enable FP8 per-tensor quantizers for the keys
"attention.o_proj", "fc1_latent_proj", and "fc2_latent_proj" so the comment
matches the mapping.

🧹 Nitpick comments (1)

modelopt/torch/quantization/calib/mse.py (1)
202-206: Add a regression test for strided FP8 candidate generation.

The existing coverage only exercises the default 126-candidate path. This branch adds two new behaviors—subsampling and forced inclusion of the last candidate—so it should have a focused test for fp8_scale_sweep_stride > 1 to lock down both the reduced candidate count and preservation of the max scale.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/calib/mse.py` around lines 202 - 206, The strided
FP8 candidate-generation branch guarded by fp8_scale_sweep_stride > 1 (in the
block that subsamples fp8_values into candidates and appends the last value) is
untested; add a focused regression test (e.g.,
test_fp8_scale_sweep_stride_preserves_last_candidate) that sets
fp8_scale_sweep_stride > 1, calls the code path that produces fp8_values, and
asserts that the resulting candidates length is reduced according to the stride
and that the final element equals the original fp8_values[-1] (verifying forced
inclusion of the max scale); ensure the test covers both subsampling and the
append behavior so the branch is locked down.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml`:
- Around line 16-28: Remove the unresolved Git merge conflict markers (<<<<<<<,
=======, >>>>>>>) and restore a single coherent comment block describing the
quant config; keep the more detailed version that lists both HF and
Megatron-Core names (the lines mentioning mixer.experts.<N>.{up,down}_proj,
mlp.experts.local_experts.<N>.linear_fc{1,2},
mixer.shared_experts.{up,down}_proj, and mlp.shared_experts.linear_fc{1,2}) or
merge its additional details into the shorter variant so the YAML comment is
valid and free of conflict markers.

---

Outside diff comments:
In `@modelopt/torch/quantization/model_calib.py`:
- Around line 389-412: The registered backend factory path (lookup via
_FP8_SWEEP_CALIBRATOR_REGISTRY and assigned to backend_factory) currently calls
the factory with the old 3-argument signature and thus ignores
fp8_scale_sweep_stride; update the branch that sets module._calibrator via
backend_factory to either (A) call the factory with the new argument (pass
fp8_scale_sweep_stride) and update the factory contract accordingly, or (B)
explicitly detect a non-default fp8_scale_sweep_stride and raise/rollback with a
clear error so users know registry-backed calibrators do not support stride;
ensure the call still passes initial_amax, module._calibrator._axis,
partial(_mse_quant_func, quantizer=module) and include fp8_scale_sweep_stride
when choosing option A, mirroring how NVFP4MSECalibrator is constructed.

---

Duplicate comments:
In
`@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-fp8-sweep-stride4.yaml`:
- Around line 34-35: The YAML description string claims that "attention o_proj,
fc1_latent_proj, and fc2_latent_proj" are FP8 per-tensor, but the quant_cfg in
this recipe does not enable quantizers for "attention.o_proj",
"fc1_latent_proj", or "fc2_latent_proj" (and the header notes latent MoE stays
BF16/FP16); fix this by either updating the description to state those
projections remain BF16/FP16 (and remove the FP8 per-tensor claim) or modify
quant_cfg to actually enable FP8 per-tensor quantizers for the keys
"attention.o_proj", "fc1_latent_proj", and "fc2_latent_proj" so the comment
matches the mapping.

In `@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml`:
- Around line 41-42: The metadata description in the YAML (the description
field) incorrectly claims that "attention o_proj", "fc1_latent_proj", and
"fc2_latent_proj" are FP8 per-tensor while the recipe does not contain matching
overrides in quant_cfg and header comments state latent MoE projections remain
BF16/FP16; update the description text to accurately reflect the recipe by
removing or changing the FP8 claims (e.g., state that latent MoE projections and
those specific projections remain BF16/FP16 and only list the layers that the
quant_cfg actually overrides as FP8 per-tensor).

---

Nitpick comments:
In `@modelopt/torch/quantization/calib/mse.py`:
- Around line 202-206: The strided FP8 candidate-generation branch guarded by
fp8_scale_sweep_stride > 1 (in the block that subsamples fp8_values into
candidates and appends the last value) is untested; add a focused regression
test (e.g., test_fp8_scale_sweep_stride_preserves_last_candidate) that sets
fp8_scale_sweep_stride > 1, calls the code path that produces fp8_values, and
asserts that the resulting candidates length is reduced according to the stride
and that the final element equals the original fp8_values[-1] (verifying forced
inclusion of the max scale); ensure the test covers both subsampling and the
append behavior so the branch is locked down.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0d72ce13-180e-46ea-b4d8-8c6c140d22a7

📥 Commits

Reviewing files that changed from the base of the PR and between 9282cdb and f796197.

📒 Files selected for processing (5)

modelopt/torch/quantization/calib/mse.py
modelopt/torch/quantization/config.py
modelopt/torch/quantization/model_calib.py
modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-fp8-sweep-stride4.yaml
modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml

coderabbitai

♻️ Duplicate comments (2)

modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml (1)

35-36: ⚠️ Potential issue | 🟡 Minor

Metadata precision mapping is inconsistent with the actual recipe.

Line 35 says latent MoE fc1_latent_proj/fc2_latent_proj are FP8 per-tensor, but this file has no latent-MoE quantizer overrides and the header (Line 27) says they stay BF16. Please align the description with quant_cfg to avoid confusion.

Proposed patch

 metadata:
   recipe_type: ptq
-  description: Super NVFP4 mixed precision — sparse MoE experts NVFP4 (W4A4, group_size 16); shared experts, mamba in/out_proj, and Latent MOE fc1_latent_proj/fc2_latent_proj
-    FP8 per-tensor; FP8 KV cache; lm_head/MTP/SSM stay BF16/FP16. Weight-MSE calibration with FP8 scale sweep.
+  description: >-
+    Super NVFP4 mixed precision — sparse MoE experts NVFP4 (W4A4, group_size 16);
+    shared experts and mamba in/out_proj FP8 per-tensor; FP8 KV cache; latent MoE,
+    lm_head, MTP, output, and mamba conv1d stay BF16; SSM cache stays FP32
+    (optionally FP16 in vLLM). Weight-MSE calibration with FP8 scale sweep.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml` around
lines 35 - 36, The description's claim that latent MoE
"fc1_latent_proj/fc2_latent_proj" are FP8 per-tensor is inconsistent with the
quant_cfg (and header) which leave those layers as BF16; either update the
description string to state that fc1_latent_proj and fc2_latent_proj remain
BF16/FP16 to match quant_cfg, or add explicit quantizer overrides in quant_cfg
for the fc1_latent_proj and fc2_latent_proj modules to set them to FP8
per-tensor (matching the rest of the FP8 settings); ensure the description text
and the quant_cfg entries (module names fc1_latent_proj, fc2_latent_proj) are
aligned.

modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-fp8-sweep-stride4.yaml (1)

34-35: ⚠️ Potential issue | 🟡 Minor

Description/comments conflict on latent MoE and SSM/mamba precision.

Lines 34-35 state latent MoE is FP8 per-tensor, but the recipe does not enable latent-MoE quantizers. Also, Lines 134-135 conflict with earlier comments on SSM/mamba precision. Please make these statements internally consistent.

Proposed patch

 metadata:
   recipe_type: ptq
-  description: Super NVFP4 mixed precision — sparse MoE experts NVFP4 (W4A4, group_size 16); shared experts, mamba in/out_proj, and Latent MOE fc1_latent_proj/fc2_latent_proj
-    FP8 per-tensor; FP8 KV cache; lm_head/MTP/SSM stay BF16/FP16. Weight-MSE calibration with stride-4 FP8 scale sweep.
+  description: >-
+    Super NVFP4 mixed precision — sparse MoE experts NVFP4 (W4A4, group_size 16);
+    shared experts and mamba in/out_proj FP8 per-tensor; FP8 KV cache; latent MoE,
+    lm_head, MTP, output, and mamba conv1d stay BF16; SSM cache stays FP32
+    (optionally FP16 in vLLM). Weight-MSE calibration with stride-4 FP8 scale sweep.
@@
-    # Stay BF16: lm_head, output projection, MoE routers/gates, MTP head.
-    # SSM state / mamba conv1d stay FP16.
+    # Stay BF16: lm_head, output projection, MoE routers/gates, MTP head, mamba conv1d.
+    # SSM state stays FP32 (can be set to FP16 in vLLM).

Also applies to: 134-135

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-fp8-sweep-stride4.yaml`
around lines 34 - 35, The description text claims "latent MoE is FP8 per-tensor"
and that "lm_head/MTP/SSM stay BF16/FP16" but the recipe doesn't enable
latent-MoE quantizers and later lines conflict on SSM/mamba precision; fix by
making the YAML fields match the prose: either add the latent-MoE quantizer key
(e.g., include latent_moe in the quantizers list or set
enable_latent_moe_quantizers: true) and ensure its precision is set to FP8
per-tensor, or change the description to remove the FP8 latent-MoE claim;
likewise reconcile SSM/mamba entries by updating the lm_head/MTP/SSM and mamba
precision fields to uniformly state BF16/FP16 (or change the description to
reflect the actual configured precisions) so the "description" string and the
quantizer/precision keys (latent_moe_quantizers, quantizers list, ssm_precision,
mamba_precision, lm_head_precision / mtp_precision) are consistent.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In
`@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-fp8-sweep-stride4.yaml`:
- Around line 34-35: The description text claims "latent MoE is FP8 per-tensor"
and that "lm_head/MTP/SSM stay BF16/FP16" but the recipe doesn't enable
latent-MoE quantizers and later lines conflict on SSM/mamba precision; fix by
making the YAML fields match the prose: either add the latent-MoE quantizer key
(e.g., include latent_moe in the quantizers list or set
enable_latent_moe_quantizers: true) and ensure its precision is set to FP8
per-tensor, or change the description to remove the FP8 latent-MoE claim;
likewise reconcile SSM/mamba entries by updating the lm_head/MTP/SSM and mamba
precision fields to uniformly state BF16/FP16 (or change the description to
reflect the actual configured precisions) so the "description" string and the
quantizer/precision keys (latent_moe_quantizers, quantizers list, ssm_precision,
mamba_precision, lm_head_precision / mtp_precision) are consistent.

In `@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml`:
- Around line 35-36: The description's claim that latent MoE
"fc1_latent_proj/fc2_latent_proj" are FP8 per-tensor is inconsistent with the
quant_cfg (and header) which leave those layers as BF16; either update the
description string to state that fc1_latent_proj and fc2_latent_proj remain
BF16/FP16 to match quant_cfg, or add explicit quantizer overrides in quant_cfg
for the fc1_latent_proj and fc2_latent_proj modules to set them to FP8
per-tensor (matching the rest of the FP8 settings); ensure the description text
and the quant_cfg entries (module names fc1_latent_proj, fc2_latent_proj) are
aligned.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 595b4531-3ced-4894-a5cc-c28f161c8f3e

📥 Commits

Reviewing files that changed from the base of the PR and between f796197 and 81d9d87.

📒 Files selected for processing (2)

modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-fp8-sweep-stride4.yaml
modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml

jinhangchoi · 2026-04-29T21:28:37Z

+#     Megatron-Core names:    mlp.shared_experts.linear_fc{1,2}
+#   - Mamba mixer linears (mixer.{in,out}_proj):                   FP8 per-tensor
+#   - KV cache:                                                    FP8
+#   - Attention linears ({q,k,v}_proj):                            BF16 (not quantized)


Can we double check attention out linear? IIRC, attention o_proj should be FP8.

responded in slack, only 2/9 attention layers had o_proj FP8 in final Super NVFP4 ckpt, but we can always add it later to test if accuracy degradation is minimal

jenchen13 · 2026-05-01T20:38:47Z

/claude review

jenchen13 · 2026-05-01T20:44:46Z

@coderabbitai full review

coderabbitai · 2026-05-01T20:44:52Z

✅ Actions performed

Full review triggered.

jenchen13 · 2026-05-01T20:45:02Z

@claude review

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

modelopt/torch/export/unified_export_megatron.py (1)
1213-1229: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Record packed-expert quant metadata against the module prefix, not the weight key.

In both pack-remap helpers, prefix is the tensor key written into state_dict (for example ...weight). Recording that verbatim produces layer_config_dict entries like ...weight.quantization, and process_layer_quant_config() will then emit quantized_layers names ending in .weight instead of the HF module prefix. Serving-side layer matching will miss those packed experts.
Suggested fix
-            self._record_layer_quant_config(prefix, qformat, block_size)
+            module_prefix = prefix.rsplit(".", 1)[0] + "."
+            self._record_layer_quant_config(module_prefix, qformat, block_size)
Apply the same normalization in both _pack_name_remapping() and _pack_name_remapping_gpt_oss().
Also applies to: 1280-1298
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/export/unified_export_megatron.py` around lines 1213 - 1229,
The code records quant metadata under the full tensor key (e.g., "...weight")
causing layer names to end with ".weight"; update both _pack_name_remapping and
_pack_name_remapping_gpt_oss to normalize the prefix before calling
self._record_layer_quant_config by stripping the tensor suffix (e.g., remove
trailing ".weight" or the last dot component) so the module-level HF prefix is
recorded instead of the weight key; apply the same normalization logic where
self._record_layer_quant_config(prefix, qformat, block_size) is invoked in both
functions.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-amax.yaml`:
- Around line 16-36: The recipe metadata/comments misstate Latent MOE behavior:
fc1_latent_proj/fc2_latent_proj are described as FP8 per-tensor but the
quant_cfg (and lines noting "stay BF16/FP16") never enable those quantizers;
either update the human-readable description to say Latent MOE projections
remain BF16 (or FP16 per existing comment) to match quant_cfg, or enable FP8
per-tensor quantizers for fc1_latent_proj and fc2_latent_proj in the quant_cfg
so the comment matches behavior—make the change by editing the
metadata/description block and/or the quant_cfg entries that reference
fc1_latent_proj and fc2_latent_proj accordingly.

In
`@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-fp8-sweep-stride4.yaml`:
- Around line 34-35: The description erroneously claims the Latent MoE is "FP8
per-tensor" while the recipe does not enable quantizer patterns for
fc1_latent_proj or fc2_latent_proj; update the metadata description string (the
YAML "description" field) to remove or correct the FP8 statement for Latent MoE
(or explicitly state that fc1_latent_proj/fc2_latent_proj remain BF16/FP16) so
it matches the actual quantizer configuration for fc1_latent_proj and
fc2_latent_proj.

In
`@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib.yaml`:
- Around line 35-36: The description claims latent MoE is FP8 and that static
scales are "chosen by MSE", but the YAML sets no quantizers for
fc1_latent_proj/fc2_latent_proj and uses method: max; update the human-readable
description to match the actual config by either (A) enabling the latent
projection quantizers (fc1_latent_proj/fc2_latent_proj) to make latent MoE FP8,
or (B) explicitly state latent projections remain unquantized (not FP8) and
change the phrase "static scales are chosen by MSE" to reflect the configured
method (e.g., "static scales computed by max" or "method: max"); adjust the
description lines referencing "FP8 per-tensor; ... Latent MOE
fc1_latent_proj/fc2_latent_proj" and the sentence about static scale selection
accordingly so the text and the fields (fc1_latent_proj, fc2_latent_proj, and
method: max) are consistent.

In `@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml`:
- Around line 35-36: The description string misreports the latent-MoE policy:
update the description (the YAML description field) to reflect that
fc1_latent_proj and fc2_latent_proj are left unquantized (per the quant_cfg)
rather than being "FP8 per-tensor"; edit the text around the FP8/KV/latent-MoE
sentence to explicitly state that fc1_latent_proj/fc2_latent_proj remain
unquantized (or their actual precision) and keep FP8 per-tensor/KV wording only
for the tensors that truly use FP8.

In `@modelopt/torch/export/unified_export_megatron.py`:
- Around line 815-818: The current check only treats qformat is None as
excluded, but QUANTIZATION_NONE should be treated the same so those layers are
recorded as excluded (and not dropped from the exported quant config); update
the conditional in unified_export_megatron.py where qformat is obtained from
_get_quantization_format(module) to also consider QUANTIZATION_NONE (e.g., if
qformat is None or qformat == QUANTIZATION_NONE) before calling
_record_excluded_module(prefix), and ensure any references to
_record_layer_quant_config still see these explicitly disabled quantizers as
excluded rather than omitted.

---

Outside diff comments:
In `@modelopt/torch/export/unified_export_megatron.py`:
- Around line 1213-1229: The code records quant metadata under the full tensor
key (e.g., "...weight") causing layer names to end with ".weight"; update both
_pack_name_remapping and _pack_name_remapping_gpt_oss to normalize the prefix
before calling self._record_layer_quant_config by stripping the tensor suffix
(e.g., remove trailing ".weight" or the last dot component) so the module-level
HF prefix is recorded instead of the weight key; apply the same normalization
logic where self._record_layer_quant_config(prefix, qformat, block_size) is
invoked in both functions.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 3adef0ee-1161-4068-a796-fc857c2455fd

📥 Commits

Reviewing files that changed from the base of the PR and between 8eec6d4 and 5e32bd1.

📒 Files selected for processing (11)

modelopt/torch/export/unified_export_megatron.py
modelopt/torch/quantization/calib/mse.py
modelopt/torch/quantization/config.py
modelopt/torch/quantization/model_calib.py
modelopt/torch/quantization/plugins/megatron.py
modelopt/torch/quantization/qtensor/nvfp4_tensor.py
modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-amax.yaml
modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-fp8-sweep-stride4.yaml
modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib.yaml
modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml
tests/gpu_megatron/torch/export/test_unified_export_megatron.py

jenchen13 · 2026-05-01T21:05:57Z

@CodeRabbit full review

coderabbitai · 2026-05-01T21:06:03Z

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

modelopt/torch/quantization/qtensor/nvfp4_tensor.py (1)

120-122: 💤 Low value

_get_static_global_amax is called twice for the static branch.

_is_static_quantizer internally calls _get_static_global_amax and discards the result; line 122 then calls it again. Consider hoisting to a single variable:

♻️ Proposed refactor

-        if cls._is_static_quantizer(weight_quantizer):
-            # Static path: use pre-computed per-block amax values from quantizer
-            global_amax = cls._get_static_global_amax(weight_quantizer).float()
+        global_amax = cls._get_static_global_amax(weight_quantizer)
+        if global_amax is not None:
+            # Static path: use pre-computed per-block amax values from quantizer
+            global_amax = global_amax.float()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/qtensor/nvfp4_tensor.py` around lines 120 - 122,
Hoist the call to _get_static_global_amax so it's executed only once: call amax
= cls._get_static_global_amax(weight_quantizer) (and .float() as needed) and
reuse that value instead of calling _get_static_global_amax again; update
_is_static_quantizer to accept an optional precomputed amax parameter or
refactor _is_static_quantizer to avoid calling _get_static_global_amax
internally so the static branch uses the hoisted amax (refer to
_is_static_quantizer, _get_static_global_amax and weight_quantizer).

tests/gpu_megatron/torch/export/test_unified_export_megatron.py (1)

151-164: ⚡ Quick win

Add one end-to-end mixed-precision export case.

The new export path here is the per-layer layer_config_dict/process_layer_quant_config flow, but this matrix still only exercises uniform configs (NVFP4_DEFAULT_CFG / FP8_DEFAULT_CFG). A single mixed recipe case would catch regressions in quantized_layers, excludes, and the config.json ↔ hf_quant_config.json parity you just tightened.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/gpu_megatron/torch/export/test_unified_export_megatron.py` around lines
151 - 164, Add a new parametrized test case to the existing
pytest.mark.parametrize matrix (the tuple of ("model_type", "arch",
"extra_module", "quant_config", "kv_cache_quant_cfg")) to exercise the per-layer
mixed-precision export path: supply a quant_config that triggers the
layer_config_dict / process_layer_quant_config flow (i.e., a mixed-recipe config
rather than NVFP4_DEFAULT_CFG or FP8_DEFAULT_CFG) so the test exercises
quantized_layers, excludes handling, and the config.json ↔ hf_quant_config.json
parity; update the matrix alongside existing entries (referencing the parameters
model_type/arch/extra_module/quant_config/kv_cache_quant_cfg) so one case uses a
mixed per-layer config for either nemotron or llama.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib.yaml`:
- Around line 45-79: The two routed-expert weight quantizer entries
('*mixer.experts.*weight_quantizer' and '*mlp.experts*weight_quantizer')
currently use type: dynamic; change them to use the static-weight calibration
path by replacing type: dynamic with type: static (or remove the dynamic setting
and ensure static is explicitly set) so routed-expert weights remain static in
this max-calib variant while leaving the corresponding input_quantizer entries
unchanged.

---

Nitpick comments:
In `@modelopt/torch/quantization/qtensor/nvfp4_tensor.py`:
- Around line 120-122: Hoist the call to _get_static_global_amax so it's
executed only once: call amax = cls._get_static_global_amax(weight_quantizer)
(and .float() as needed) and reuse that value instead of calling
_get_static_global_amax again; update _is_static_quantizer to accept an optional
precomputed amax parameter or refactor _is_static_quantizer to avoid calling
_get_static_global_amax internally so the static branch uses the hoisted amax
(refer to _is_static_quantizer, _get_static_global_amax and weight_quantizer).

In `@tests/gpu_megatron/torch/export/test_unified_export_megatron.py`:
- Around line 151-164: Add a new parametrized test case to the existing
pytest.mark.parametrize matrix (the tuple of ("model_type", "arch",
"extra_module", "quant_config", "kv_cache_quant_cfg")) to exercise the per-layer
mixed-precision export path: supply a quant_config that triggers the
layer_config_dict / process_layer_quant_config flow (i.e., a mixed-recipe config
rather than NVFP4_DEFAULT_CFG or FP8_DEFAULT_CFG) so the test exercises
quantized_layers, excludes handling, and the config.json ↔ hf_quant_config.json
parity; update the matrix alongside existing entries (referencing the parameters
model_type/arch/extra_module/quant_config/kv_cache_quant_cfg) so one case uses a
mixed per-layer config for either nemotron or llama.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 94f3528f-2ee4-4757-b8de-3a323300ca74

📥 Commits

Reviewing files that changed from the base of the PR and between 8eec6d4 and 5de5541.

📒 Files selected for processing (10)

modelopt/torch/export/unified_export_megatron.py
modelopt/torch/quantization/calib/mse.py
modelopt/torch/quantization/config.py
modelopt/torch/quantization/model_calib.py
modelopt/torch/quantization/plugins/megatron.py
modelopt/torch/quantization/qtensor/nvfp4_tensor.py
modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-fp8-sweep-stride4.yaml
modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib.yaml
modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml
tests/gpu_megatron/torch/export/test_unified_export_megatron.py

coderabbitai · 2026-05-01T21:14:34Z

+    # MoE routed experts -> NVFP4 W4A4, block_size 16, e4m3 scale.
+    # HF/export names: backbone.layers.*.mixer.experts.*.{up,down}_proj.
+    - quantizer_name: '*mixer.experts.*weight_quantizer'
+      enable: true
+      cfg:
+        block_sizes:
+          -1: 16
+          type: dynamic
+          scale_bits: e4m3
+        num_bits: e2m1
+    - quantizer_name: '*mixer.experts.*input_quantizer'
+      enable: true
+      cfg:
+        block_sizes:
+          -1: 16
+          type: dynamic
+          scale_bits: e4m3
+        num_bits: e2m1
+    # Megatron-Core/PTQ names: decoder.layers.*.mlp.experts.local_experts.*.linear_fc{1,2}.
+    - quantizer_name: '*mlp.experts*weight_quantizer'
+      enable: true
+      cfg:
+        block_sizes:
+          -1: 16
+          type: dynamic
+          scale_bits: e4m3
+        num_bits: e2m1
+    - quantizer_name: '*mlp.experts*input_quantizer'
+      enable: true
+      cfg:
+        block_sizes:
+          -1: 16
+          type: dynamic
+          scale_bits: e4m3
+        num_bits: e2m1


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep routed-expert weights static in the max-calib variant.

These two *...weight_quantizer entries are type: dynamic, so this recipe does more than swap method: mse for method: max—it changes routed-expert weights to dynamic NVFP4 and bypasses the static-weight calibration path entirely. That makes this variant a different quantization recipe, not a max-calibrated version of super-nvfp4.yaml.

Suggested fix

- quantizer_name: '*mixer.experts.*weight_quantizer' enable: true cfg: block_sizes: - type: dynamic + type: static scale_bits: e4m3 num_bits: e2m1 @@ - quantizer_name: '*mlp.experts*weight_quantizer' enable: true cfg: block_sizes: - type: dynamic + type: static scale_bits: e4m3 num_bits: e2m1

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# MoE routed experts -> NVFP4 W4A4, block_size 16, e4m3 scale.

# HF/export names: backbone.layers.*.mixer.experts.*.{up,down}_proj.

- quantizer_name: '*mixer.experts.*weight_quantizer'

enable: true

cfg:

block_sizes:

-1: 16

type: dynamic

scale_bits: e4m3

num_bits: e2m1

- quantizer_name: '*mixer.experts.*input_quantizer'

enable: true

cfg:

block_sizes:

-1: 16

type: dynamic

scale_bits: e4m3

num_bits: e2m1

# Megatron-Core/PTQ names: decoder.layers.*.mlp.experts.local_experts.*.linear_fc{1,2}.

- quantizer_name: '*mlp.experts*weight_quantizer'

enable: true

cfg:

block_sizes:

-1: 16

type: dynamic

scale_bits: e4m3

num_bits: e2m1

- quantizer_name: '*mlp.experts*input_quantizer'

enable: true

cfg:

block_sizes:

-1: 16

type: dynamic

scale_bits: e4m3

num_bits: e2m1

# MoE routed experts -> NVFP4 W4A4, block_size 16, e4m3 scale.

# HF/export names: backbone.layers.*.mixer.experts.*.{up,down}_proj.

- quantizer_name: '*mixer.experts.*weight_quantizer'

enable: true

cfg:

block_sizes:

-1: 16

type: static

scale_bits: e4m3

num_bits: e2m1

- quantizer_name: '*mixer.experts.*input_quantizer'

enable: true

cfg:

block_sizes:

-1: 16

type: dynamic

scale_bits: e4m3

num_bits: e2m1

# Megatron-Core/PTQ names: decoder.layers.*.mlp.experts.local_experts.*.linear_fc{1,2}.

- quantizer_name: '*mlp.experts*weight_quantizer'

enable: true

cfg:

block_sizes:

-1: 16

type: static

scale_bits: e4m3

num_bits: e2m1

- quantizer_name: '*mlp.experts*input_quantizer'

enable: true

cfg:

block_sizes:

-1: 16

type: dynamic

scale_bits: e4m3

num_bits: e2m1

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib.yaml` around lines 45 - 79, The two routed-expert weight quantizer entries ('*mixer.experts.*weight_quantizer' and '*mlp.experts*weight_quantizer') currently use type: dynamic; change them to use the static-weight calibration path by replacing type: dynamic with type: static (or remove the dynamic setting and ensure static is explicitly set) so routed-expert weights remain static in this max-calib variant while leaving the corresponding input_quantizer entries unchanged.

shengliangxu · 2026-05-01T23:00:53Z

-            if hasattr(self, "kv_cache_dtype"):
-                self._hf_quant_config["quantization"]["kv_cache_quant_algo"] = self.kv_cache_dtype
+            # Use one serving-facing config for both hf_quant_config.json and config.json.
+            self._hf_quant_config = convert_hf_quant_config_format(raw_hf_quant_config)


do we change the format of hf_quant_config.json by this change?

yes to align it with the quant config inside config.json. previously hf_quant_config was still using the older json format whereas the quant config inside config.json was being converted to a newer format

Fridah-nv · 2026-05-04T21:08:20Z

+        return global_amax is not None
+
+    @classmethod
+    def _get_static_global_amax(cls, weight_quantizer):


Thanks for catching the bug! I'd like to understand why in mcore path we use _global_amax instead of global_amax, is there any general guidance for when to use private attribute name and when to use public ones? I think we have similar issue with amax vs _amax

during checkpoint resume in MCore it initializes the quantizers as generic TensorQuantizer and those have _global_amax parameter (instead of NVFP4QTensor which has global_amax)

@Fridah-nv I have changed MCore static quantizer restore to restore as NVFP4QTensor in mtq.plugins.custom

Fridah-nv

Changes in MSE flow LGTM

jenchen13 · 2026-05-07T19:41:20Z

/claude review

claude · 2026-05-07T19:46:26Z

                        axis=module._calibrator._axis,
                        global_amax=module.global_amax,
                        quant_func=partial(_mse_quant_func, quantizer=module),
+                        fp8_scale_sweep_stride=fp8_scale_sweep_stride,


[CRITICAL Algorithm] NVFP4MSECalibrator.__init__ does not accept a fp8_scale_sweep_stride parameter (see calib/mse.py:177-184). Passing it here will raise TypeError: __init__() got an unexpected keyword argument 'fp8_scale_sweep_stride' at runtime whenever fp8_scale_sweep=True and the quantizer is an NVFP4 static quantizer.

Either:

Add fp8_scale_sweep_stride to NVFP4MSECalibrator.__init__ and use it in _generate_candidates to subsample the candidate list (e.g., fp8_values[::stride]), or

Remove this kwarg from the call site until the calibrator supports it.

claude · 2026-05-07T19:46:32Z

-            if hasattr(self, "kv_cache_dtype"):
-                self._hf_quant_config["quantization"]["kv_cache_quant_algo"] = self.kv_cache_dtype
+            # Use one serving-facing config for both hf_quant_config.json and config.json.
+            self._hf_quant_config = convert_hf_quant_config_format(raw_hf_quant_config)


[IMPORTANT Compatibility] This changes the schema of hf_quant_config.json from the raw ModelOpt format ({"producer": {...}, "quantization": {...}}) to the compressed-tensors-compatible format ({"config_groups": {...}, "ignore": [...], "quant_algo": ...}).

The HF export path (unified_export_hf.py:1190-1194) still writes the raw format to hf_quant_config.json and the converted format to config.json. This PR makes the MCore export path diverge by writing converted format in both places.

Any downstream tools (TRT-LLM build scripts, vLLM loaders) that read hf_quant_config.json from MCore-exported checkpoints and expect the "quantization" top-level key will break. If this is intentional (which the test updates suggest), please mention the schema change in the PR description and/or changelog so consumers know to update.

claude · 2026-05-07T19:46:47Z

+        if _has_state(self.weight_quantizer, "_amax") and not _has_complete_static_nvfp4_weight_state(
+            self.weight_quantizer, self.weight
+        ):


[IMPORTANT Compatibility] The condition _has_state(wq, "_amax") and not _has_complete_static_nvfp4_weight_state(wq, self.weight) means: if _amax exists but the quantizer is NOT a complete static NVFP4, recalibrate. Previously it always recalibrated when _amax existed.

However, _has_complete_static_nvfp4_weight_state checks global_amax = getattr(quantizer, "global_amax", None) which relies on the global_amax property of NVFP4StaticQuantizer (which in turn reads _global_amax buffer). If the quantizer class hasn't been upgraded to NVFP4StaticQuantizer yet at this point during restore (because modelopt_post_restore is called before the megatron plugin's set_extra_state → from_tensor_quantizer), then global_amax will be None (plain TensorQuantizer doesn't have this property).

In that case, the function returns False, falling through to recalibrate — which is the safe/correct fallback. So this works correctly, but the "preserve MSE scales" path only activates when the quantizer has already been upgraded. Is that the intended sequence? If the HF restore path always upgrades first (via the _check_unsupported_states / from_tensor_quantizer flow), this is fine. Just flagging for awareness.

claude · 2026-05-07T19:46:52Z

+        for leaf_quantizer in _iter_leaf_quantizers(quantizer):
+            if _is_dynamic_block_quantizer(leaf_quantizer):
+                continue
+            leaf_quantizer.sync_amax_across_distributed_group(parallel_state.data_parallel_group)
+            leaf_quantizer.sync_amax_across_distributed_group(
+                parallel_state.expert_model_parallel_group


[SUGGESTION] The old code guarded sync_amax_across_distributed_group with if getattr(quantizer, "_amax", None) is not None. The new code removes this guard and relies on the internal check inside sync_amax_across_distributed_group.

While functionally correct (the method has if ... getattr(self, "_amax", None) is not None internally), this introduces unnecessary all_reduce call-site overhead for dynamic block quantizers that were already skipped by _is_dynamic_block_quantizer. However, non-dynamic quantizers that happen to not have _amax (e.g., disabled quantizers yielded by _iter_leaf_quantizers) will now call into sync_amax_across_distributed_group only to no-op internally. Minor perf consideration — no functional bug.

claude · 2026-05-07T19:47:06Z

+        self._record_layer_quant_config(q_proj_prefix, qformat, block_size)
+        self._record_layer_quant_config(k_proj_prefix, qformat, block_size)
+        self._record_layer_quant_config(v_proj_prefix, qformat, block_size)
+        if qformat in (None, QUANTIZATION_NONE):
+            # MCore stores Q/K/V as one fused linear_qkv module, but HF exports them
+            # as separate q_proj/k_proj/v_proj modules. Record the HF names so
+            # runtime quant configs do not miss excluded fused-QKV projections.
+            fused_prefix = prefix.removesuffix(".")
+            self.exclude_modules = [m for m in self.exclude_modules if m != fused_prefix]
+            self._record_excluded_module(q_proj_prefix)
+            self._record_excluded_module(k_proj_prefix)
+            self._record_excluded_module(v_proj_prefix)


[SUGGESTION] When qformat in (None, QUANTIZATION_NONE), _record_excluded_module is called with q_proj_prefix, k_proj_prefix, v_proj_prefix which all end with ".". The method correctly strips the trailing dot via removesuffix(".").

However, the list-comprehension self.exclude_modules = [m for m in self.exclude_modules if m != fused_prefix] removes the original fused module entry. If _qkv_slicing is called multiple times for the same prefix (e.g., from different pipeline stages in distributed), the second call would find fused_prefix already absent — which is fine. But the deduplication inside _record_excluded_module (the if layer_name not in self.exclude_modules check) prevents the individual proj names from being double-added too. Good.

One minor observation: if prefix is "" (empty), then fused_prefix = "".removesuffix(".") is also "", and the list-comp would try to remove "" from exclude_modules. You've guarded _record_excluded_module against empty strings, but the fused_prefix removal isn't similarly guarded. Consider adding if fused_prefix: before the list-comp to be consistent.

claude · 2026-05-07T19:47:12Z

    start_multiplier: float = 0.25,
    stop_multiplier: float = 4.0,
    fp8_scale_sweep: bool = False,
+    fp8_scale_sweep_stride: int = 1,


[IMPORTANT Compatibility] fp8_scale_sweep_stride is added to mse_calibrate() but not to MseCalibConfig (see config.py:1287). Since wrapped_calib_func calls func(model, forward_loop=forward_loop, **config.model_dump()), this parameter can never flow through the standard ModelOpt mode/config pipeline (e.g., when using apply_mode with a recipe).

For recipes like super-nvfp4.yaml to actually use fp8_scale_sweep_stride, it needs to be added to MseCalibConfig (or a new config field) and parsed from the recipe YAML algorithm section. Without this, the parameter is only usable via direct mse_calibrate() calls, which limits the recipe-driven workflow.

claude

Code Review Summary

Findings: CRITICAL: 1, IMPORTANT: 3, SUGGESTION: 2

Critical

fp8_scale_sweep_stride passed to NVFP4MSECalibrator which doesn't accept it (model_calib.py:412). This will raise a TypeError at runtime whenever fp8_scale_sweep=True and the quantizer matches the NVFP4 static path. The constructor at calib/mse.py:177-184 has no such parameter.

Important

hf_quant_config.json schema change (unified_export_megatron.py:354). MCore exports now write the converted (compressed-tensors) format to hf_quant_config.json, breaking downstream tools that expect the raw {"producer": {...}, "quantization": {...}} format. The HF export path still writes the old format to this file.
fp8_scale_sweep_stride not in MseCalibConfig (model_calib.py:335). The parameter can only be used via direct mse_calibrate() calls, not through the standard recipe/mode pipeline, limiting its utility for the new YAML recipes.
Static NVFP4 preserve-amax path timing (plugins/custom.py:164). The "preserve MSE scales" logic depends on the quantizer already being upgraded to NVFP4StaticQuantizer. If upgrade hasn't happened yet (depends on restore ordering), it safely falls through to recalibration — but this means MSE scales may not be preserved in all restore paths.

Suggestions

Guard fused_prefix removal against empty string (unified_export_megatron.py:1074).
Minor perf: removed _amax guard (model_calib.py:196-201). Functionally correct but calls into sync method for quantizers without _amax.

Overall Assessment

Risk: Medium-High. The critical TypeError bug (#1) will crash any user path that invokes mse_calibrate with fp8_scale_sweep=True on a model with NVFP4 static quantizers — which is the exact use case of the new super-nvfp4.yaml recipe. The hf_quant_config.json schema change (#2) is a compatibility concern for consumers of MCore-exported checkpoints.

The core logic changes (dynamic block quantizer detection fix, _global_amax fallback for restored static quantizers, per-layer quant config collection for mixed-precision export, QKV exclude expansion) are well-reasoned and the new unit tests cover the key behaviors.

Signed-off-by: Jennifer Chen <[email protected]>

Static MSE NVFP4 export had two latent bugs that surfaced as garbage outputs during vLLM serving: 1. nvfp4_tensor.get_weights_scaling_factor_from_quantizer (static branch) clamped per-block FP8 scales below at 2^-9 (smallest e4m3fn subnormal), so tiny-amax blocks landed on FP8 byte 0x01 instead of underflowing naturally to 0x00. The dynamic branch never had this clamp; removing it makes the static path bit-equivalent to dynamic when _amax matches the block max. 2. quant_utils._ensure_weight_quantizer_calibrated treated a present-and- non-None _amax buffer as already calibrated, but _reset_pytorch_state_from_metadata registers buffers via torch.empty(...) during MCore restore. If the saved distcp shard didn't fill the buffer (e.g., a routed expert that never received calibration tokens, or a buffer first registered on the meta device by MaxCalibrator's meta-device fast path), the empty memory persisted as values like -1e37, producing valid-but-bogus FP8 scale bytes during static export that silently NaN'd in vLLM's NVFP4 MoE compute. Add a post-calibration validation pass in mse_calibrate that recomputes _amax / _global_amax from the actual weight tensor when they're missing, on the meta device, non-finite, or negative -- equivalent to the max-calibration fallback used at export time, applied earlier so the saved checkpoint is clean for both MCore and vLLM consumers. Per-quantizer warnings are emitted so the failure is visible in PTQ logs. Add CPU pytest coverage in test_nvfp4_static_export_cpu.py for the no-NaN/Inf, bound, static-vs-dynamic equivalence, and amax-zero corner cases. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

…ithm Mirrors MAMBA_MOE_NVFP4_AGGRESSIVE_CFG verbatim except algorithm becomes {"method": "mse", "fp8_scale_sweep": False}. Lets the same quantizer coverage (weight + input on all routed/MoE matmuls, with the standard disabled lists) be exercised under MSE calibration via QUANT_CONFIG= MAMBA_MOE_NVFP4_AGGRESSIVE_MSE_CFG without needing a separate recipe yaml. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Previous version mirrored MAMBA_MOE_NVFP4_AGGRESSIVE_CFG verbatim, which uses _nvfp4_cfg (block_sizes type='dynamic'). MSE search only operates on quantizers promoted to NVFP4StaticQuantizer (is_static_block_quant), so applying algorithm='mse' to dynamic weights was a no-op — the resulting checkpoint was effectively max-calibrated dynamic NVFP4. Confirmed in practice: 6528 max_calibrate forward calls during MCore restore on the 160895 ckpt, no NVFP4StaticQuantizer instances created. Match NVFP4_W4A4_WEIGHT_MSE_FP8_SWEEP_CFG's weight cfg (block_sizes type='static', scale_bits=(4,3)) so MSE actually engages. Inputs stay dynamic. Keeps the AGGRESSIVE_CFG quantizer coverage (mamba MoE disable lists) so this is the direct MSE counterpart to AGGRESSIVE_CFG for the Ultra V3 / Nemotron-H mixer topology. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Per coworker guidance, MSE-calibrated NVFP4 export should bound the FP8 per-block scale to the fp8_e4m3fn representable finite range so tiny-amax blocks don't underflow to byte 0x00 and produce zero output even when the weights are nontrivial. Apply ``.clamp_(min=2**-9, max=448.0)`` on the static path and ``.clamp_(min=2**-9)`` on the dynamic path (the upper clamp landed in main as 097293b / Fridah's fix mirrors the lower clamp on both paths to keep them bit-equivalent). Update test_static_matches_dynamic_when_amax_is_block_max accordingly: both paths now produce identical FP8 bytes for every block including underflow blocks. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Adds repair_sharded_modelopt_state(model, checkpoint_name, prefix, metadata) to the megatron-core dist-checkpointing plugin. After the main phase-2 load (dist_checkpointing.load + model.load_state_dict in Megatron's load_modelopt_checkpoint), this helper detects weight/input/output quantizer _amax / _global_amax buffers that contain NaN / Inf / negative content (torch.empty(...) leakage from a phase-2 read that silently dropped the key) and re-fills them by re-issuing dist_checkpointing.load directly with a synthetic ShardedTensor request whose descriptor matches what the saved .metadata records. This works around an MCore distcp resharding issue observed on PP=1 EP=16 save resharded to PP=12 EP=1 export: each PP rank's first MoE local layer expert _amax / _global_amax buffers stay at uninitialized memory after phase-2. Direct dist_checkpointing.load on the same keys returns the saved MSE values (verified empirically) -- so the upstream phase-2 path is wrong, not the on-disk data. The helper bypasses whatever descriptor mismatch is upstream and copies the saved values into the runtime buffers, preserving the saved (e.g. MSE-calibrated) state instead of falling back to max-calibration. Treats zero as a valid amax (legitimate dead-block amax for NVFP4 static). No-op if no invalid buffers are detected. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

repair_sharded_modelopt_state v1 walked the dict keys returned by model.sharded_state_dict() to filter quantizer buffers, but those keys are the *runtime* names (e.g. ...local_experts.{local_idx}.linear_fc1 .weight_quantizer._amax). The saved sharded checkpoint keys (and .metadata, and what dist_checkpointing.load matches against) use the *sharded* names with the SequentialMLP rewrite already applied: ...experts.{global_idx}.linear_fc1.weight_quantizer._amax. The rewrite is performed by replace_prefix_for_sharding which only modifies ShardedTensor.key (Megatron-LM/megatron/core/dist_checkpointing/utils.py: line 184-207), not the containing dict's keys. v1 therefore looked up the runtime-style key in the saved metadata, got None, marked the buffer as not in saved metadata, and skipped it. The repair fired on zero buffers. Verified empirically against export job 162125: post-phase-2 dump still shows the same 354 NaN amax pattern as the pre-repair behavior, with max-cal'd post-repair amax (different per PP rank) coming from _ensure_weight_quantizer_calibrated, not from the new repair function. Switch the filter and metadata lookup to v.key (the rewritten sharded_state_dict key) so we match what is actually saved. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Job 162145's export log shows my probe (Option C) ran successfully but the repair (Option B v2) emitted no output at all. The post-restore quantizer summary still shows the same 354 NaN amax pattern as the pre-repair behavior, suggesting the repair function never fired or returned 0 candidates. Add three unconditional prints to disambiguate: - On entry: confirm the function is even being called. - At each early-return: identify which guard tripped. - After candidate scan: show how many ShardedTensor entries were walked, how many had _amax/_global_amax suffixes, and how many failed _quantizer_buffer_value_is_invalid. If we see the entry print but `invalid=0`, the validation predicate is wrong. If we don't see the entry print at all, the import or caller gating is at fault. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

…yer expert 0 fc1 Diagnostic export 162306 confirmed that repair_sharded_modelopt_state enters and walks the full runtime sharded_state_dict (rank 0: 12338 amax_keys among 16490 sharded tensors), but reports invalid=0 even though the post-restore quantizer summary shows 30 NaN amax buffers on that rank. Two possibilities: (a) v.data is not the same memory as the model's runtime buffer (some view / clone happened along the way), or (b) _quantizer_buffer_value_is_invalid is mis-detecting bf16 NaN. Add a sampling print: for the first 4 keys ending in .mlp.experts.experts.0.linear_fc1.weight_quantizer.<suffix> on each rank, dump tensor dtype/shape and counts of nan/inf/neg/zero plus finite min/max. The post-restore dump shows these specific keys with amax=[nan, nan]; if the diagnostic shows nan=0 for the same keys, v.data is not the live buffer and we need a different access path. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

repair_sharded_modelopt_state and _quantizer_buffer_value_is_invalid were added in commits b25d487d0 / 9b35905f1 / 1f483ef5c / a348b2f98 on the assumption that phase-2 dist_checkpointing.load was silently dropping per-rank-first-MoE-local-layer routed-expert _amax / _global_amax buffers (leaving them at torch.empty(...) garbage). The diagnostic export 162306 + 162320 disproved that hypothesis: the function entered, walked the full 12338-amax-key sharded_state_dict on every rank, and reported invalid=0 every time. The original "354 NaN amax modules" finding turned out to be the model-builder pre-load print (model_builder.py:355, before any checkpoint load) misread as a post-load print. The post-load print shows 0 NaN modules. Direct empirical confirmation: - export 162527 (PTQ -> phase-2 -> export) had 0 _ensure_weight _quantizer_calibrated warnings on stderr. - validate_export_per_block_amax.py against the same export: 0 / 110_100_480 weight_scale bytes mismatched vs prediction from saved per-block _amax + _global_amax. - validate_repair_export_mse.py: 0/168 weight_scale_2 mismatches. - vLLM smoke (162530) returned coherent text + HTTP 200 logprobs. - MCore generate (162531) returned coherent text. The repair function never had real work to do. Drop the dead code so load_modelopt_checkpoint stays clean. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

After rebasing onto upstream main (commit 1d796f9 'Fused Triton kernel for NVFP4 FP8 scale sweep search'), TritonNVFP4MSECalibrator is the default for mse_calibrate(..., fp8_scale_sweep=True). Per the upstream PR, this is ~42x faster than the Python reference on B300 with identical global weight-MSE. Flip MAMBA_MOE_NVFP4_AGGRESSIVE_MSE_CFG's algorithm option from fp8_scale_sweep=False to fp8_scale_sweep=True so PTQ runs through the full 126-candidate FP8 e4m3 scale sweep when generating per-block amax for routed-expert weight quantizers. MODELOPT_NVFP4_TRITON_SWEEP=0 falls back to the Python reference for debugging or numerics comparison. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Signed-off-by: Jennifer Chen <[email protected]>

jenchen13 requested a review from a team as a code owner April 28, 2026 18:33

jenchen13 requested a review from h-guo18 April 28, 2026 18:33

coderabbitai Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml Outdated

Comment thread modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml Outdated

jenchen13 requested a review from realAsma April 28, 2026 18:40

jenchen13 changed the title ~~Add nemotron Super v3 NVFP4 PTQ recipe~~ Add Nemotron Super v3 NVFP4 PTQ recipe Apr 28, 2026

jenchen13 requested a review from a team as a code owner April 29, 2026 19:11

jenchen13 changed the title ~~Add Nemotron Super v3 NVFP4 PTQ recipe~~ Fix dynamic block quantizer detection & MSE MOE calibration; Add Nemotron Super v3 NVFP4 PTQ recipe Apr 29, 2026

jenchen13 requested review from Fridah-nv and meenchen April 29, 2026 19:15

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml Outdated

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

jinhangchoi reviewed Apr 29, 2026

View reviewed changes

jenchen13 requested a review from a team as a code owner May 1, 2026 19:29

jenchen13 changed the title ~~Fix dynamic block quantizer detection & MSE MOE calibration; Add Nemotron Super v3 NVFP4 PTQ recipe~~ Support Mixed precision & Static MSE PTQ in MCore export; Add Nemotron Super v3 NVFP4 PTQ recipe May 1, 2026

jenchen13 changed the title ~~Support Mixed precision & Static MSE PTQ in MCore export; Add Nemotron Super v3 NVFP4 PTQ recipe~~ Support Mixed precision & Static MSE PTQ in MCore export; Nemotron Super v3 NVFP4 recipe May 1, 2026

coderabbitai Bot reviewed May 1, 2026

View reviewed changes

shengliangxu reviewed May 1, 2026

View reviewed changes

Fridah-nv reviewed May 4, 2026

View reviewed changes

Fridah-nv approved these changes May 4, 2026

View reviewed changes

jenchen13 force-pushed the jennifchen/super_nvfp4_recipe branch from 1a10eeb to 8fc569d Compare May 7, 2026 19:43

claude Bot reviewed May 7, 2026

View reviewed changes

jenchen13 and others added 14 commits May 11, 2026 11:44

MSE fixes for mcore and fix HF export and add Super NVFP4 yaml recipe

693b85f

Signed-off-by: Jennifer Chen <[email protected]>

remove stride implementation

4bc564b

Signed-off-by: Jennifer Chen <[email protected]>

lint

7ca5463

Signed-off-by: Jennifer Chen <[email protected]>

more concise comments

caf7274

Signed-off-by: Jennifer Chen <[email protected]>

jenchen13 force-pushed the jennifchen/super_nvfp4_recipe branch from 6016020 to caf7274 Compare May 11, 2026 18:47

jenchen13 requested review from a team as code owners May 11, 2026 18:47

Conversation

jenchen13 commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Super recipe

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2026-05-11 18:51 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

jinhangchoi Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

jenchen13 Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

jenchen13 commented May 1, 2026

Uh oh!

jenchen13 commented May 1, 2026

Uh oh!

coderabbitai Bot commented May 1, 2026

Uh oh!

jenchen13 commented May 1, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jenchen13 commented May 1, 2026

Uh oh!

coderabbitai Bot commented May 1, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

shengliangxu May 1, 2026

Choose a reason for hiding this comment

Uh oh!

jenchen13 May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Fridah-nv May 4, 2026

Choose a reason for hiding this comment

Uh oh!

jenchen13 commented Apr 28, 2026 •

edited

Loading

coderabbitai Bot commented Apr 28, 2026 •

edited

Loading

codecov Bot commented Apr 28, 2026 •

edited

Loading

github-actions Bot commented Apr 29, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-05-11 18:51 UTC.
Preview will be ready when the GitHub Pages deployment is complete.