Schematize config loading and quantizer config entries by shengliangxu · Pull Request #1405 · NVIDIA/Model-Optimizer

shengliangxu · 2026-05-07T08:14:23Z

What does this PR do?

Type of change: new feature

This PR schematizes ModelOpt config loading and quantization config entries while keeping the existing YAML/dict config surface backward compatible.

Makes ModeloptBaseConfig implement MutableMapping, including __delitem__ unset semantics for model_dump(exclude_unset=True). This change will make us be able to use ModeloptBaseConfig and simple dict interchangeably. But we will formally use ModeloptBaseConfig, dict is just to make it backward compatible.
Replaces the QuantizerCfgEntry TypedDict with a ModeloptBaseConfig.
Makes load_config return Pydantic-normalized values when a schema_type or modelopt-schema annotation is present, with overloads documenting the typed return behavior.
Converts recipe metadata to a ModeloptBaseConfig Pydantic model so recipe configs share the same schema/field behavior as other ModelOpt configs.
Normalizes legacy quantizer config formats, dict entries, typed entries, and mixed typed/dict lists into ModeloptBaseConfig based.
Converts predefined Python quantization recipe constants to QuantizeConfig objects via QuantizeConfig.model_validate(...).
Updates quantization config consumers to use Mapping/MutableMapping protocol checks for typed config entries and attributes while preserving dict-style access used by existing code.
Adds coverage for typed config loading, recipe metadata parsing, quantizer config normalization/validation, mutable mapping behavior, typed preset configs, and both supported need_calibration argument types.

Usage

Existing YAML and dict-based quantization configs continue to work. Callers that want schema-backed objects can now request them directly:

from modelopt.torch.opt.config_loader import load_config
from modelopt.torch.quantization.config import QuantizeConfig, normalize_quant_cfg_list

quantize_config = load_config("quantize.yaml", schema_type=QuantizeConfig)
quant_cfg = normalize_quant_cfg_list(
    [
        {"quantizer_name": "*input_quantizer", "cfg": {"num_bits": 8}},
        {"quantizer_name": "*weight_quantizer", "enable": False},
    ]
)

Predefined Python configs are now schema-backed but still support mapping-style access:

from modelopt.torch.quantization import FP8_DEFAULT_CFG

entry = FP8_DEFAULT_CFG["quant_cfg"][0]
entry["enable"] = False

Testing

python -m pytest tests/unit/recipe/test_loader.py
python -m pytest tests/unit/torch/quantization/test_config_validation.py
python -m pytest tests/unit/torch/quantization/test_autoquant.py
python -m pytest tests/unit/torch/utils/test_serialization.py
ruff check on the touched Python files
ruff format --check on the touched Python files
pre-commit run mypy --files ... on the touched Python files
git diff --check
Commit-time pre-commit hooks passed

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅ Existing YAML/dict quantization configs and legacy quant_cfg forms are still accepted and normalized; typed configs retain mapping-style access for existing call sites.
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: ✅
Did you update Changelog?: N/A

Additional Information

No related issue. This PR intentionally does not change checked-in YAML configs; it updates the loader/schema layer and Python-side quantization config handling.

Summary by CodeRabbit

New Features
- Schema-backed, typed config objects adopted across quantization and recipe APIs.
Improvements
- Accept mapping-like config inputs broadly and apply safer materialization/copy semantics.
- Stronger normalization, validation, and typed access for quantizer entries, recipes, and loaders.
Documentation
- Guide clarified: cfg supports typed quantizer attribute configs (lists allowed); legacy dict/list forms are deprecated and normalized.
Tests
- Expanded tests covering typed parsing, normalization, validation, and error cases.

copy-pr-bot · 2026-05-07T08:14:27Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-05-07T08:14:31Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 87456ede-6531-4dcb-b723-0a6cd23ca033

📥 Commits

Reviewing files that changed from the base of the PR and between 3f533fa and d9eccf3.

📒 Files selected for processing (5)

examples/llm_ptq/hf_ptq.py
modelopt/torch/quantization/algorithms.py
modelopt/torch/quantization/nn/modules/tensor_quantizer.py
tests/unit/torch/quantization/test_autoquant.py
tests/unit/torch/quantization/test_quantize_cpu.py

✅ Files skipped from review due to trivial changes (1)

modelopt/torch/quantization/algorithms.py

🚧 Files skipped from review as they are similar to previous changes (3)

tests/unit/torch/quantization/test_quantize_cpu.py
modelopt/torch/quantization/nn/modules/tensor_quantizer.py
examples/llm_ptq/hf_ptq.py

📝 Walkthrough

Walkthrough

This PR modernizes the quantization configuration system by converting TypedDict schemas to Pydantic models, implementing type-safe config loading with schema-backed overloads, and broadening type checks from dict to Mapping throughout the codebase. QuantizerCfgEntry becomes a validated Pydantic class, RecipeMetadataConfig becomes a typed model, and ModeloptBaseConfig gains MutableMapping support with __delitem__.

Changes

Quantization Configuration Schema Modernization

Layer / File(s)	Summary
Documentation & Deprecation Notes `docs/source/guides/_quant_cfg.rst`	`_quant_cfg.rst` clarifies `QuantizerAttributeConfig`-backed forms, adds deprecation notices for weakly schematized dict/list forms, and documents schema-backed YAML loading behavior.
Diffusers example: safe mutation & explicit-keys `examples/diffusers/quantization/*`	Diffusers quantize modules materialize Modelopt config objects, deep-copy before per-run mutation, and use Mapping/MutableMapping-aware key checks when injecting `trt_high_precision_dtype`.
Example scripts & utilities `examples/llm_autodeploy/run_auto_quantize.py`, `examples/llm_ptq/`, `examples/vllm_serve/`	Example scripts updated to use typed identifiers or Mapping-aware config handling; `run_auto_quantize` changes constant values to string identifiers; llm_ptq utilities adopt `QuantizeConfig` and Mapping checks.
ONNX export utils `modelopt/onnx/llm_export_utils/quantization_utils.py`	`get_quant_config()` now copies the `quant_cfg` sequence directly rather than filtering only dict-shaped entries.
Recipe metadata & loader `modelopt/recipe/config.py`, `modelopt/recipe/loader.py`	`RecipeMetadataConfig` converted to a `ModeloptBaseConfig` Pydantic model; loader uses `_load_raw_config_with_schema(...).data`, validates mappings, and delegates PTQ quantize loading to `load_config` with schema.
Modelopt base config mapping `modelopt/torch/opt/config.py`	`ModeloptBaseConfig` mixes in `MutableMapping[str, Any]`, raises `KeyError` for missing keys, blocks deletions via `__delitem__`, and exposes explicit key iteration/accessors.
Config loader & overloads `modelopt/torch/opt/config_loader.py`	`load_config` gains typed `@overload` declarations and returns validated/normalized payload via `_validate_modelopt_schema` when the YAML declares a schema or `schema_type` is supplied.
Quantizer attribute & entry schema `modelopt/torch/quantization/config.py`	`QuantizerAttributeConfig` validation broadened to `Mapping`; `QuantizerCfgEntry` converted from `TypedDict` to `ModeloptBaseConfig` with validators rejecting null `cfg`/`enable`, enforcing non-empty `quantizer_name`, and validating sequential `cfg` lists.
normalize_quant_cfg_list & helpers `modelopt/torch/quantization/config.py`	`normalize_quant_cfg_list` rewritten to accept `Mapping` or `Sequence`, expand legacy flat-dict and `nn.*` forms into canonical typed `QuantizerCfgEntry` list, and added helpers for legacy roundtrips and preset fragment loading.
Presets & selective builders `modelopt/torch/quantization/config.py`	Many preset constants (INT8, FP8, NVFP4, KV/rotation/MHA presets) converted from raw dict literals to validated `QuantizeConfig.model_validate(...)` instances; selective builders return typed `QuantizeConfig`.
QuantRecipe & algorithm handling `modelopt/torch/quantization/algorithms.py`	`QuantRecipeConfig` union added; `QuantRecipe` accepts `str`/`Mapping`/`QuantizeConfig`/`None` and normalizes via `model_copy(deep=True)` or `QuantizeConfig.model_validate`.
Mapping-based generalization `examples/`, `modelopt/torch/quantization/backends/`, `modelopt/torch/quantization/mode.py`	Broad updates replace `isinstance(..., dict)` checks with `isinstance(..., Mapping)` across validation, availability checks, compression estimation, and algorithm handling; use `explicit_items` for `ModeloptBaseConfig` when available.
Conversion APIs & partial updates `modelopt/torch/quantization/conversion.py`	`set_quantizer_by_cfg`, `set_quantizer_attributes_partial`, `set_quantizer_by_cfg_context` updated for `QuantizeQuantCfgInputType` and `Mapping`-based validation; `cfg` accepted only as `None`, `QuantizerAttributeConfig`, or `list[QuantizerAttributeConfig]` after normalization.
Public quantize() & auto_quantize() API `modelopt/torch/quantization/model_quant.py`	`quantize` accepts `QuantizeConfig
TensorQuantizer & SequentialQuantizer `modelopt/torch/quantization/nn/modules/tensor_quantizer.py`	`TensorQuantizer.set_from_attribute_config` and `SequentialQuantizer.set_from_attribute_config` accept `Mapping[str, Any]`; sequential list inputs must match `len(self)` or raise `ValueError`; non-list Mapping inputs are broadcast.
Backend availability checks `modelopt/torch/quantization/backends/*.py`	FP8/NVFP4 availability checks accept `Mapping` inputs, require quantizers to be enabled before compatibility checks, and iterate `explicit_items` for `ModeloptBaseConfig` instances while skipping the `"enable"` key.
KV-cache utilities `modelopt/torch/quantization/utils/core_utils.py`, `examples/llm_ptq/hf_ptq.py`, `examples/vllm_serve/vllm_ptq_utils.py`	`update_quant_cfg_with_kv_cache_quant` now validates the outer config and returns `QuantizeConfig`; hf_ptq and vllm_ptq_utils support in-place cfg mutation and broader entry handling.
Tests & example tests `tests/unit/recipe/test_loader.py`, `tests/unit/torch/quantization/test_config_validation.py`, `tests/unit/torch/quantization/test_quantize_cpu.py`, `tests/examples/diffusers/test_diffusers.py`	Tests add normalization helpers (`_cfg_to_dict`, `_entry_to_dict`), assert schema-typed loading, validate `QuantizerCfgEntry` dict-like semantics and `QuantizeConfig` parsing for dict/list forms, reject null `cfg`/`enable`, and add sequential-quantizer length-mismatch/type-error tests; diffusers test dynamically loads example config to assert explicit key handling.

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

NVIDIA/Model-Optimizer#1094: Similar redesign of quant_cfg from dict-based forms to typed QuantizeConfig/QuantizerCfgEntry with normalization and schema-awareness.

Suggested reviewers

meenchen
realAsma
jingyu-ml

🚥 Pre-merge checks | ✅ 6

✅ Passed checks (6 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Schematize config loading and quantizer config entries' accurately captures the main change: introducing schema-backed handling for configuration and quantizer entries throughout the codebase.
Docstring Coverage	✅ Passed	Docstring coverage is 88.10% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	No security anti-patterns found. All 29 modified files pass: no unsafe torch.load, numpy.load, hardcoded trust_remote_code, eval/exec, or new # nosec comments.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch shengliangx/schematize-cfg

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-07T08:22:17Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1405/
Built to branch `gh-pages` at 2026-05-09 01:05 UTC. Preview will be ready when the GitHub Pages deployment is complete.

codecov · 2026-05-07T08:28:49Z

Codecov Report

❌ Patch coverage is 92.22973% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.90%. Comparing base (a098759) to head (d9eccf3).

Files with missing lines	Patch %	Lines
modelopt/torch/quantization/config.py	93.12%	11 Missing ⚠️
modelopt/torch/quantization/algorithms.py	80.00%	4 Missing ⚠️
...torch/quantization/backends/fp8_per_tensor_gemm.py	82.35%	3 Missing ⚠️
modelopt/torch/opt/config.py	93.33%	2 Missing ⚠️
...delopt/onnx/llm_export_utils/quantization_utils.py	0.00%	1 Missing ⚠️
modelopt/torch/opt/config_loader.py	91.66%	1 Missing ⚠️
modelopt/torch/quantization/backends/nvfp4_gemm.py	92.30%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1405      +/-   ##
==========================================
- Coverage   76.91%   76.90%   -0.02%     
==========================================
  Files         478      478              
  Lines       51434    51560     +126     
==========================================
+ Hits        39563    39651      +88     
- Misses      11871    11909      +38

Flag	Coverage Δ
examples	`41.82% <66.89%> (+0.95%)`	⬆️
gpu	`59.89% <76.68%> (-0.54%)`	⬇️
regression	`15.29% <45.27%> (+0.15%)`	⬆️
unit	`52.63% <81.75%> (+0.04%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

modelopt/torch/quantization/config.py (1)

1093-1114: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Preserve list-valued cfg for legacy nn.*-scoped entries.

This branch unconditionally does dict(sub_cfg) for non-typed sub-configs, which blows up on sequential configs like {"nn.Linear": {"*weight_quantizer": [{...}, {...}]}}. The non-scoped legacy path still accepts list-valued cfgs, so this regresses class-scoped legacy configs only.

Suggested fix

             for q_path, sub_cfg in value.items():
                 if isinstance(sub_cfg, QuantizerAttributeConfig):
                     enable = None
                     cfg = sub_cfg
-                else:
+                elif isinstance(sub_cfg, Mapping):
                     sub_cfg = dict(sub_cfg)
                     enable = sub_cfg.pop("enable", None)
                     cfg = sub_cfg or None
+                else:
+                    enable = None
+                    cfg = sub_cfg
                 entry: dict[str, Any] = {
                     "parent_class": key,
                     "quantizer_name": q_path,
                     "cfg": cfg,
                 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt/torch/quantization/config.py` around lines 1093 - 1114, The nn.*
branch currently does unguarded dict(sub_cfg) which breaks when a sub_cfg is a
list (legacy sequential configs); in the loop inside the key.startswith("nn.")
branch (variables: q_path, sub_cfg, entries, entry, parent_class,
quantizer_name, cfg, enable, QuantizerAttributeConfig) change the logic so you
only coerce sub_cfg to dict when it's a Mapping (e.g., isinstance(sub_cfg,
Mapping)); for non-Mapping values (not a QuantizerAttributeConfig and not a
Mapping) preserve sub_cfg as-is (so list-valued cfgs remain lists), then extract
enable only from Mapping-based dicts and set cfg accordingly before building and
appending each entry returned by this branch.

modelopt/torch/quantization/nn/modules/tensor_quantizer.py (1)

1431-1437: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate attribute-list length before applying sequential configs.

At Line 1435, zip(attributes, self) silently drops extras and leaves trailing quantizers unchanged when lengths differ. This should fail fast to prevent partial/implicit config application.

Proposed fix

     if not isinstance(attributes, (list, tuple)):
         assert isinstance(attributes, Mapping), "attributes must be a list or a mapping."
         attributes = [attributes] * len(self)
+    elif len(attributes) != len(self):
+        raise ValueError(
+            f"Expected {len(self)} attribute configs, but got {len(attributes)}."
+        )
 
     for attribute, quantizer in zip(attributes, self):
         quantizer.set_from_attribute_config(attribute)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt/torch/quantization/nn/modules/tensor_quantizer.py` around lines 1431
- 1437, When applying a sequence of attribute configs in tensor_quantizer.py,
validate that a provided attributes list/tuple has the same length as the
quantizer sequence (self) before zipping; if lengths differ raise a clear
exception (e.g., ValueError) stating the expected and actual lengths so the call
to set_from_attribute_config cannot silently skip trailing quantizers.
Specifically, in the method containing the current attributes handling and the
loop that calls set_from_attribute_config, add a length check for
isinstance(attributes, (list, tuple)) and raise on mismatch.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/llm_ptq/example_utils.py`:
- Around line 252-255: The phi4mm path is reverting typed QuantizeConfig entries
to plain dicts by appending raw dicts to quant_cfg_obj["quant_cfg"]; instead
append properly constructed QuantizerCfgEntry objects so the config remains
fully typed. Locate the code that mutates quant_cfg_obj (the block adding
entries for "*speech*", "*audio*", "*image*", "*vision*") and replace those dict
appends with creation of QuantizerCfgEntry instances (matching the type used
elsewhere) and append those instances; ensure build_quant_cfg() returns a
QuantizeConfig with quant_cfg containing only QuantizerCfgEntry objects on the
phi4mm path.

In `@examples/vllm_serve/vllm_ptq_utils.py`:
- Around line 122-123: The generator expression scanning kv_quant_cfg calls
e.get("quantizer_name") without ensuring e is a mapping; modify the predicate
used where kv_quant_cfg is scanned (the generator/filter that looks for
"quantizer_name" == "*[kv]_bmm_quantizer") to first check the entry type (e.g.,
isinstance(e, dict) or isinstance(e, collections.abc.Mapping) or hasattr(e,
"get")) before calling .get, so malformed/non-mapping entries are skipped and do
not raise an AttributeError.

In `@modelopt/torch/opt/config.py`:
- Around line 120-133: Modify __delitem__ to adhere to MutableMapping semantics
by converting AttributeError into KeyError when resolving the mapping key and
when refusing to unset a field with no default: wrap the call to
get_field_name_from_key(key) so any AttributeError raised is caught and
re-raised as KeyError(key), and replace the final raise AttributeError(...) that
checks for PydanticUndefined with raise KeyError(key) (including a descriptive
message if desired); keep the existing logic for handling model_extra,
model_fields_set, and using field_info.get_default(call_default_factory=True)
and PydanticUndefined.

In `@modelopt/torch/quantization/config.py`:
- Around line 552-574: The validator method validate_quantizer_cfg_entry
currently treats explicit None for the 'cfg' or 'enable' fields as if they were
omitted, allowing entries like {"quantizer_name":"*","enable":None} or
{"quantizer_name":"*","cfg":None} to behave as enabled; change the logic to
reject explicit nulls: if 'cfg' in values and values['cfg'] is None raise
ValueError("cfg must be omitted or a valid mapping/list, not null"), and if
'enable' in values and values['enable'] is None raise ValueError("enable must be
a boolean when provided, not null"); preserve existing behavior for omitted keys
and keep calling cls._validate_enabled_cfg(cfg) only when enable is truthy and
cfg is not None; apply the same null-rejection checks to the analogous validator
block referenced at lines 1157-1197.

In `@modelopt/torch/quantization/conversion.py`:
- Line 252: The code forces legacy dict-shaped quant_cfg through list() before
calling normalize_quant_cfg_list, which converts {"*": {"enable": False}} into
["*"] and breaks backward-compatible handling in normalize_quant_cfg_list and
downstream set_quantizer_by_cfg / set_quantizer_by_cfg_context; fix by removing
the list(...) wrapper and pass quant_cfg directly into normalize_quant_cfg_list
(and apply the same change for the other occurrence around line 499) so the
function can accept both legacy flat dicts and list forms.

In `@modelopt/torch/quantization/utils/core_utils.py`:
- Around line 935-939: The code is appending kv_cache_quant_cfg entries directly
into updated_quant_cfg["quant_cfg"], risking shared mutable state; modify the
concatenation in core_utils.py so you deep-copy each entry from
kv_cache_quant_cfg (or perform a deepcopy of the sequence) before extending
updated_quant_cfg["quant_cfg"] (i.e., when creating inner and assigning
updated_quant_cfg["quant_cfg"] = inner + ...), ensuring entries like
QuantizerCfgEntry instances from kv_cache_quant_cfg are duplicated rather than
referenced.

---

Outside diff comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1093-1114: The nn.* branch currently does unguarded dict(sub_cfg)
which breaks when a sub_cfg is a list (legacy sequential configs); in the loop
inside the key.startswith("nn.") branch (variables: q_path, sub_cfg, entries,
entry, parent_class, quantizer_name, cfg, enable, QuantizerAttributeConfig)
change the logic so you only coerce sub_cfg to dict when it's a Mapping (e.g.,
isinstance(sub_cfg, Mapping)); for non-Mapping values (not a
QuantizerAttributeConfig and not a Mapping) preserve sub_cfg as-is (so
list-valued cfgs remain lists), then extract enable only from Mapping-based
dicts and set cfg accordingly before building and appending each entry returned
by this branch.

In `@modelopt/torch/quantization/nn/modules/tensor_quantizer.py`:
- Around line 1431-1437: When applying a sequence of attribute configs in
tensor_quantizer.py, validate that a provided attributes list/tuple has the same
length as the quantizer sequence (self) before zipping; if lengths differ raise
a clear exception (e.g., ValueError) stating the expected and actual lengths so
the call to set_from_attribute_config cannot silently skip trailing quantizers.
Specifically, in the method containing the current attributes handling and the
loop that calls set_from_attribute_config, add a length check for
isinstance(attributes, (list, tuple)) and raise on mismatch.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e15d92dc-470c-4dd5-a1bf-b8509a4179d2

📥 Commits

Reviewing files that changed from the base of the PR and between 570920b and bbb251f.

📒 Files selected for processing (25)

docs/source/guides/_quant_cfg.rst
examples/diffusers/quantization/config.py
examples/diffusers/quantization/quantize.py
examples/llm_autodeploy/run_auto_quantize.py
examples/llm_ptq/cast_mxfp4_to_nvfp4.py
examples/llm_ptq/example_utils.py
examples/llm_ptq/hf_ptq.py
examples/llm_ptq/multinode_ptq.py
examples/vllm_serve/vllm_ptq_utils.py
modelopt/onnx/llm_export_utils/quantization_utils.py
modelopt/recipe/config.py
modelopt/recipe/loader.py
modelopt/torch/opt/config.py
modelopt/torch/opt/config_loader.py
modelopt/torch/quantization/algorithms.py
modelopt/torch/quantization/backends/fp8_per_tensor_gemm.py
modelopt/torch/quantization/backends/nvfp4_gemm.py
modelopt/torch/quantization/config.py
modelopt/torch/quantization/conversion.py
modelopt/torch/quantization/mode.py
modelopt/torch/quantization/model_quant.py
modelopt/torch/quantization/nn/modules/tensor_quantizer.py
modelopt/torch/quantization/utils/core_utils.py
tests/unit/recipe/test_loader.py
tests/unit/torch/quantization/test_config_validation.py

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

modelopt/torch/opt/config.py (1)

60-125: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix MutableMapping protocol violation: __getitem__ must raise KeyError, not AttributeError.

After ModeloptBaseConfig inherits MutableMapping[str, Any] at line 60, missing-key lookups propagate AttributeError from get_field_name_from_key(). This violates the mapping contract, breaking inherited methods like pop() which expect KeyError. Proof: when __getitem__ raises AttributeError, MutableMapping.pop(key, default) fails to use the default value.

Currently, __delitem__ and get() already catch AttributeError and convert to KeyError—this inconsistency exposes the protocol violation.

Proposed minimal fix

diff --git a/modelopt/torch/opt/config.py b/modelopt/torch/opt/config.py
@@ -99,7 +99,7 @@ class ModeloptBaseConfig(BaseModel, MutableMapping[str, Any]):
                 if field_info.alias == key:
                     return name
-            raise AttributeError(f"Key {key} not found in the config.")
+            raise KeyError(key)
 
     def __contains__(self, key: str) -> bool:
@@ -107,7 +107,7 @@ class ModeloptBaseConfig(BaseModel, MutableMapping[str, Any]):
             self.get_field_name_from_key(key)
             return True
-        except AttributeError:
+        except KeyError:
             return False
 
     def __delitem__(self, key: str) -> None:
@@ -121,7 +121,7 @@ class ModeloptBaseConfig(BaseModel, MutableMapping[str, Any]):
         try:
             field_name = self.get_field_name_from_key(key)
-        except AttributeError as e:
+        except KeyError as e:
             raise KeyError(key) from e
 
     def get(self, key: str, default: Any = None) -> Any:
@@ -129,7 +129,7 @@ class ModeloptBaseConfig(BaseModel, MutableMapping[str, Any]):
         try:
             return self[key]
-        except AttributeError:
+        except KeyError:
             return default

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt/torch/opt/config.py` around lines 60 - 125, The mapping methods
violate MutableMapping because get_field_name_from_key() can raise
AttributeError; wrap calls to get_field_name_from_key() in __getitem__ and
__setitem__ (and anywhere else that should follow mapping semantics) to catch
AttributeError and re-raise KeyError(key) so missing-key lookups conform to the
MutableMapping contract; locate the fixes in the ModeloptBaseConfig methods
__getitem__, __setitem__ (and mirror the pattern used in __delitem__) to ensure
consistency.

🧹 Nitpick comments (1)

examples/llm_ptq/example_utils.py (1)
849-855: ⚡ Quick win

Align quant_cfg type hints with the new Mapping-compatible behavior.

needs_checkpoint_path_update now supports any Mapping, but the signature still advertises dict. Consider updating this signature (and resolve_checkpoint_dir) to Mapping[str, Any] / MutableMapping[str, Any] so static typing matches runtime behavior.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/llm_ptq/example_utils.py` around lines 849 - 855, The type hints
should reflect that quant_cfg can be any Mapping; update the signatures of
needs_checkpoint_path_update and resolve_checkpoint_dir to accept Mapping[str,
Any] (or MutableMapping[str, Any] if the function mutates the dict) instead of
dict, and import the required typing symbols (Mapping, MutableMapping, Any) at
the top of the module; keep the runtime logic unchanged but ensure the
annotations match the Mapping-compatible behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@modelopt/torch/opt/config.py`:
- Around line 60-125: The mapping methods violate MutableMapping because
get_field_name_from_key() can raise AttributeError; wrap calls to
get_field_name_from_key() in __getitem__ and __setitem__ (and anywhere else that
should follow mapping semantics) to catch AttributeError and re-raise
KeyError(key) so missing-key lookups conform to the MutableMapping contract;
locate the fixes in the ModeloptBaseConfig methods __getitem__, __setitem__ (and
mirror the pattern used in __delitem__) to ensure consistency.

---

Nitpick comments:
In `@examples/llm_ptq/example_utils.py`:
- Around line 849-855: The type hints should reflect that quant_cfg can be any
Mapping; update the signatures of needs_checkpoint_path_update and
resolve_checkpoint_dir to accept Mapping[str, Any] (or MutableMapping[str, Any]
if the function mutates the dict) instead of dict, and import the required
typing symbols (Mapping, MutableMapping, Any) at the top of the module; keep the
runtime logic unchanged but ensure the annotations match the Mapping-compatible
behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: fe529dcf-649f-47fa-910a-b1be49110dc6

📥 Commits

Reviewing files that changed from the base of the PR and between bbb251f and fe6dbcf.

📒 Files selected for processing (9)

examples/llm_ptq/example_utils.py
examples/vllm_serve/vllm_ptq_utils.py
modelopt/torch/opt/config.py
modelopt/torch/quantization/config.py
modelopt/torch/quantization/conversion.py
modelopt/torch/quantization/nn/modules/tensor_quantizer.py
modelopt/torch/quantization/utils/core_utils.py
tests/unit/torch/quantization/test_config_validation.py
tests/unit/torch/quantization/test_quantize_cpu.py

🚧 Files skipped from review as they are similar to previous changes (2)

modelopt/torch/quantization/conversion.py
modelopt/torch/quantization/config.py

Have load_config return Pydantic-normalized values when schema_type or modelopt-schema is present, including typed recipe metadata and quantization config entries. Update recipe loading, docs, and unit tests for typed config objects and normalized quant_cfg handling. Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Convert QuantizerCfgEntry into a ModeloptBaseConfig-backed Pydantic model with validation while preserving dict-style access for callers. Normalize schema-loaded quant_cfg snippets through model_dump, simplify quantizer cfg handling, and cover both dict and QuantizeConfig need_calibration inputs. Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Update normalize_quant_cfg_list to accept dict entries, typed entries, and legacy dict formats while returning QuantizerCfgEntry objects. Preserve already parsed entries, handle implicit enable values in consumers, and cover mixed typed/dict inputs in tests. Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Make ModeloptBaseConfig a MutableMapping and use Mapping/MutableMapping protocol checks for typed quantizer config entries and attributes. Convert predefined quantization recipes to QuantizeConfig objects while preserving dict-style callers and compatibility paths. Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Cover normalization after mutating raw dict quantizer entries and schema-backed ModeloptBaseConfig entries. Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

…e-cfg

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Around line 101-103: The current code uses entry.get("cfg") or {} which treats
any falsy cfg (like a typed empty mapping) as missing and replaces it; instead,
fetch cfg with entry.get("cfg") and only replace it when it is actually None or
not present (e.g., if entry.get("cfg") is None: set a new dict), then set
cfg["use_constant_amax"] = True and write back entry["cfg"] = cfg so you
preserve existing typed/empty config objects; reference the variables entry and
cfg and the key "use_constant_amax" when making the change.

In `@modelopt/torch/quantization/algorithms.py`:
- Around line 130-149: The _str_repr uses the variable name which can be None
when a QuantizeConfig instance is passed, causing distinct recipes to collapse;
update the constructor logic around QuantizeConfig handling so that if name is
None and quant_cfg is an instance of QuantizeConfig you derive a stable
identifier from the config (e.g., use a preserved recipe name/property on
QuantizeConfig or fallback to a deterministic descriptor like class name plus a
short hash of the config contents) before building self._str_repr and before
using the config for equality/hash; ensure you reference QuantizeConfig,
self.config, and self._str_repr when making the change.

In `@modelopt/torch/quantization/nn/modules/tensor_quantizer.py`:
- Around line 1428-1435: The method's type annotation and runtime check are
incorrect: replace the assert on line 1432 with an explicit exception (e.g.,
raise TypeError or ValueError) to validate inputs at runtime, and broaden the
type annotation on the attributes parameter to accept either a Mapping or a
list/tuple whose elements may be QuantizerAttributeConfig or Mapping (e.g.,
attributes: Mapping[str, Any] | list[QuantizerAttributeConfig | Mapping[str,
Any]] | tuple[QuantizerAttributeConfig | Mapping[str, Any], ...]); keep the
existing behavior of wrapping a single Mapping into a list with [attributes] *
len(self) and raise a clear error if a list/tuple length doesn't match self
(using ValueError).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d88a743f-edba-444a-9a09-53d15b1e99b4

📥 Commits

Reviewing files that changed from the base of the PR and between 3a82fea and 3f533fa.

📒 Files selected for processing (28)

docs/source/guides/_quant_cfg.rst
examples/diffusers/quantization/config.py
examples/diffusers/quantization/quantize.py
examples/llm_autodeploy/run_auto_quantize.py
examples/llm_ptq/cast_mxfp4_to_nvfp4.py
examples/llm_ptq/example_utils.py
examples/llm_ptq/hf_ptq.py
examples/llm_ptq/multinode_ptq.py
examples/vllm_serve/vllm_ptq_utils.py
modelopt/onnx/llm_export_utils/quantization_utils.py
modelopt/recipe/config.py
modelopt/recipe/loader.py
modelopt/torch/opt/config.py
modelopt/torch/opt/config_loader.py
modelopt/torch/quantization/algorithms.py
modelopt/torch/quantization/backends/fp8_per_tensor_gemm.py
modelopt/torch/quantization/backends/nvfp4_gemm.py
modelopt/torch/quantization/config.py
modelopt/torch/quantization/conversion.py
modelopt/torch/quantization/mode.py
modelopt/torch/quantization/model_quant.py
modelopt/torch/quantization/nn/modules/tensor_quantizer.py
modelopt/torch/quantization/utils/core_utils.py
tests/examples/diffusers/test_diffusers.py
tests/unit/recipe/test_loader.py
tests/unit/torch/opt/test_config.py
tests/unit/torch/quantization/test_config_validation.py
tests/unit/torch/quantization/test_quantize_cpu.py

✅ Files skipped from review due to trivial changes (2)

examples/vllm_serve/vllm_ptq_utils.py
examples/llm_ptq/example_utils.py

🚧 Files skipped from review as they are similar to previous changes (17)

examples/llm_autodeploy/run_auto_quantize.py
tests/unit/torch/quantization/test_quantize_cpu.py
modelopt/torch/quantization/backends/nvfp4_gemm.py
docs/source/guides/_quant_cfg.rst
examples/diffusers/quantization/quantize.py
modelopt/torch/quantization/mode.py
modelopt/torch/quantization/utils/core_utils.py
examples/llm_ptq/multinode_ptq.py
examples/llm_ptq/cast_mxfp4_to_nvfp4.py
modelopt/recipe/loader.py
modelopt/torch/quantization/model_quant.py
modelopt/torch/opt/config_loader.py
modelopt/onnx/llm_export_utils/quantization_utils.py
tests/unit/torch/quantization/test_config_validation.py
tests/unit/recipe/test_loader.py
modelopt/torch/quantization/conversion.py
modelopt/torch/quantization/config.py

…e-cfg

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

shengliangxu changed the title ~~schematize cfg~~ Schematize config loading and quantizer config entries May 7, 2026

shengliangxu force-pushed the shengliangx/schematize-cfg branch from 066c2c8 to 8e8153a Compare May 7, 2026 18:27

shengliangxu marked this pull request as ready for review May 7, 2026 22:20

shengliangxu requested review from a team as code owners May 7, 2026 22:20

shengliangxu requested review from cjluo-nv, jenchen13, jingyu-ml, kevalmorabia97, meenchen and realAsma May 7, 2026 22:20

coderabbitai Bot reviewed May 7, 2026

View reviewed changes

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

shengliangxu marked this pull request as draft May 8, 2026 15:36

shengliangxu added 8 commits May 8, 2026 13:21

simplify quantize config loading

fdad74c

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

use ModeloptField

b7c9359

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Add mixed quant config normalization test

b23e3a9

Cover normalization after mutating raw dict quantizer entries and schema-backed ModeloptBaseConfig entries. Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Address quant config review feedback

5969cb3

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Update recipe loader schema expectations

0d31d46

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

shengliangxu force-pushed the shengliangx/schematize-cfg branch from f9bc176 to b0fadd1 Compare May 8, 2026 20:21

shengliangxu added 2 commits May 8, 2026 13:23

Tighten ModeloptBaseConfig mapping semantics

d33ee36

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

fix test errors

0917ab8

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

shengliangxu force-pushed the shengliangx/schematize-cfg branch from b0fadd1 to 0917ab8 Compare May 8, 2026 20:24

shengliangxu added 2 commits May 8, 2026 15:43

Fix diffusers quant config explicit key handling

b5b45b9

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Merge remote-tracking branch 'origin/main' into shengliangx/schematiz…

3f533fa

…e-cfg

shengliangxu marked this pull request as ready for review May 8, 2026 23:25

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

Comment thread examples/llm_ptq/hf_ptq.py Outdated

Comment thread modelopt/torch/quantization/algorithms.py Outdated

Comment thread modelopt/torch/quantization/nn/modules/tensor_quantizer.py Outdated

shengliangxu added 2 commits May 8, 2026 17:35

Merge remote-tracking branch 'origin/main' into shengliangx/schematiz…

164cccd

…e-cfg

fix review comments

d9eccf3

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Conversation

shengliangxu commented May 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented May 7, 2026

Uh oh!

coderabbitai Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Uh oh!

github-actions Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2026-05-09 01:05 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

codecov Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shengliangxu commented May 7, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 7, 2026 •

edited

Loading

github-actions Bot commented May 7, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-05-09 01:05 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

codecov Bot commented May 7, 2026 •

edited

Loading