fix(security): bound chat-template rendering with minijinja fuel by inureyes · Pull Request #139 · lablup/mlxcel

inureyes · 2026-05-30T02:09:15Z

Summary

Model-supplied chat templates are rendered through minijinja (src/server/chat_template.rs), both per request and at model-load time via the supports_tools probe. Rendering was previously unbounded, so a pathological template — e.g. deeply nested or effectively unbounded {% for %} loops — could consume unbounded CPU/memory, a denial-of-service vector. As established in the companion analysis (#128), this path is not RCE-vulnerable (unlike CVE-2026-5760), but bounding render cost is sound defense-in-depth and becomes materially important in any multi-tenant deployment where untrusted parties could cause arbitrary models to be loaded.

Changes

Cargo.toml — enable the minijinja fuel cargo feature (alongside the existing loop_controls).
configure_environment — call env.set_fuel(Some(50_000_000)). Because both apply* render paths and the load-time supports_tools probe funnel through this single function, one budget covers every render path. The chosen budget and its rationale are documented inline.
Test — test_pathological_template_is_bounded_by_fuel asserts a nested unbounded loop errors via ErrorKind::OutOfFuel ("engine ran out of fuel") while an identically-shaped small loop still renders, isolating fuel exhaustion as the cause.

Safety / no panic

Fuel exhaustion returns a minijinja::Error, which both render call sites propagate via anyhow::Result (.with_context("Failed to render chat template")), and the load-time probe already degrades to a string heuristic on render failure. No path panics.

Budget rationale

50M VM instructions is deliberately generous. Real templates — including tool-rich Qwen/Nemotron-style templates rendered over long multi-turn histories — execute well under ~1M instructions, while an unbounded loop blows past 50M almost instantly, bounding a malicious render to a fraction of a second of CPU. Note that fuel bounds instruction count, not output size, so it does not by itself cap memory from a single very large emit; that is out of scope here.

Verification

cargo test --features metal,accelerate --lib server::chat_template — 71 passed, 0 failed.
Real-model audit test_all_local_model_templates_render (--ignored) over the local models/ directory — 91 templates checked, 267 scenarios passed, 0 failures (6 intentional raise_exception rejections, unrelated to fuel). Confirms no false-positive fuel exhaustion on real templates.
cargo clippy --features metal,accelerate --lib --tests -- -D warnings — clean.
cargo fmt --check — clean.

Closes #129

Model-supplied chat templates are rendered through minijinja, both per request and at model-load time via the `supports_tools` probe. Rendering was previously unbounded, so a pathological template (e.g. deeply nested or effectively unbounded `{% for %}` loops) could consume unbounded CPU/memory — a denial-of-service vector. This is RCE-safe (unlike CVE-2026-5760) but warrants defense-in-depth, especially for any multi-tenant deployment where untrusted parties could cause arbitrary models to be loaded. Enable the minijinja `fuel` feature and cap each render at 50M VM instructions via `set_fuel` in the shared `configure_environment`, covering every render path. Exhaustion surfaces as a clean `ErrorKind::OutOfFuel` error — both render call sites propagate it via `Result`, and the load-time probe already degrades to a string heuristic — so it never panics. The budget is deliberately generous: real templates (including tool-rich Qwen/Nemotron templates over long histories) execute well under ~1M instructions, verified by the local-model audit (91 templates, 267 scenarios, 0 failures), while an unbounded loop is bounded to a fraction of a second. Add a regression test asserting a nested unbounded loop errors via fuel exhaustion while an identically-shaped small loop still renders. Closes #129

inureyes · 2026-05-30T02:12:27Z

Implementation Review Summary

Intent

Bound minijinja chat-template rendering with a fuel budget so a pathological/untrusted model-supplied template cannot cause CPU/memory DoS (issue #129, defense-in-depth).

Findings Addressed

None — no CRITICAL/HIGH/MEDIUM/LOW findings. Implementation is correct, complete, and safe as submitted.

Remaining Items

None.

Verification

All stated requirements implemented — "fuel" feature added to minijinja; set_fuel(Some(50_000_000)) with documented rationale in configure_environment; unit test added. All three security: bound chat-template rendering with minijinja fuel to limit DoS from untrusted model templates #129 acceptance criteria met.
No placeholder/mock code remaining — real, functional bound; no TODOs/stubs.
Integrated into project code flow — chat_template.rs is the only minijinja consumer in the workspace. Both render sites (apply_raw_with_kwargs, apply_with_kwargs) build Environment::new() + configure_environment; no render site bypasses it. The supports_tools probe funnels through self.apply(...) and falls back to a string heuristic on any render Err (incl. OutOfFuel) — never panics. Render call sites propagate via anyhow .with_context(...).
Project conventions followed — feature flag correct, inline-format args, no unwrap/expect on render results, thorough rationale comment, no structural changes.
Existing modules reused — uses upstream Environment::set_fuel; nothing reimplemented.
No unintended structural changes — additive only (1 feature flag, 1 set_fuel call + comment, 1 test).
Tests pass — test_pathological_template_is_bounded_by_fuel passes in ~5s (terminates a 10-billion-iteration nested loop via the fuel ceiling).

Test is not a false positive: minijinja's range() rejects len() > 100000 (strictly greater); range(100000) = exactly 100000 elements is accepted, so the nested loop reaches the renderer and is killed by fuel — ErrorKind::OutOfFuel ("engine ran out of fuel"), matched via the anyhow {:#} chain — rather than erroring on "too many elements". Control case (range(3)) renders, isolating fuel as the cause.

Budget soundness: 50M VM instructions is ~50x headroom over real templates (<~1M); orchestrator's real-model audit (91 templates / 267 scenarios) showed 0 false-positive exhaustions. No regression risk to legitimate rendering.

inureyes · 2026-05-30T02:17:44Z

Security & performance review — PR #139 (fuel-bounded chat-template rendering)

Reviewed the mitigation against the DoS vector it targets, plus the change's own footprint. Verified against the resolved dependency (minijinja 2.20.0) source, not just the API contract. No CRITICAL or HIGH findings; the mitigation is correct and complete for its stated scope. No auto-fix applied.

Mitigation correctness — verified

Funnel is total. The only two Environment::new() sites — apply_with_kwargs (chat_template.rs:415) and apply_raw_with_kwargs (chat_template.rs:353) — each call configure_environment on the very next line, and the load-time supports_tools probe (compute_supports_tools, chat_template.rs:291-292) routes through apply → apply_with_kwargs. A repo-wide grep confirms minijinja is used only in chat_template.rs (the mlxcel-surgery .render() hits are an unrelated positional-template type). There is no render path that bypasses the budget.
Fuel actually bounds the loop vector. In minijinja-2.20.0/src/vm/mod.rs:318-321 the tracker is consulted before every instruction dispatch, and vm/fuel.rs:52-69 charges 1 fuel for the general instruction case (loop-iteration instructions included). A pathological {% for %} therefore cannot outrun the budget.
Nested loops/macros share one budget. FuelTracker is created once per top-level eval (vm/state.rs:97) and the same Arc is cloned into every sub-eval/macro frame (vm/mod.rs:151). No per-loop reset bypass exists.
No panic path. OutOfFuel is a minijinja::Error; both render sites propagate it via .with_context("Failed to render chat template")? and the probe degrades to the string heuristic on render failure. Confirmed.

Concurrency — confirmed safe (as the PR states)

FuelTracker is per-render (vm/state.rs:97 builds a fresh one from env().fuel() for each evaluation; each apply* builds a fresh Environment). The AtomicIsize counter is never shared across concurrent renders, and within a render it uses Ordering::Relaxed — correct, since it guards a single logical budget with no cross-thread happens-before requirement. No shared-state race.

Budget magnitude — appropriate

50,000,000 ≈ 50M VM instructions. Empirically corroborated by the real-model audit (91 templates / 267 scenarios, 0 false-positive exhaustions; real templates run well under ~1M), so legitimate headroom is ~50×. 50M is far below an isize on 32/64-bit, so the fetch_sub in vm/fuel.rs:32 cannot overflow. The bound is well-chosen: generous enough for legitimate tool-rich/long-history templates, tight enough to cap a malicious render to a fraction of a second of release-build CPU.

Change footprint — negligible

Per render: one Option<u64> store (set_fuel) + one Arc allocation (FuelTracker::new) + one relaxed atomic decrement per executed instruction. For a ~1M-instruction template that is sub-millisecond. The fuel cargo feature is a pure toggle — Cargo.lock shows no new transitive dependency (still just memo-map, serde), so zero supply-chain expansion. loop_controls was preserved alongside the new fuel feature.

Test quality — good

test_pathological_template_is_bounded_by_fuel pairs a bounded control (range(3)×range(3)) with the pathological case (range(100000)×range(100000)), correctly isolating fuel exhaustion from parse/structural errors, and asserts on the "fuel" substring of the anyhow chain (matches OutOfFuel's display "engine ran out of fuel", error.rs:168). Ran it in isolation: passes.

Residual DoS surface — matches the PR's documented scoping (no action needed)

Fuel meters instruction count, not native-callback work or output size. The one place that could theoretically amplify cost per instruction is the custom-function / set_unknown_method_callback surface (chat_template.rs:616-895), since work inside a Rust callback runs unmetered. I audited all of it: every shim (split/replace/upper/join/count/items/…) produces output bounded by O(input size) with a single allocation per call — there is no size-multiplying primitive exposed (no repeat, no width-controlled padding, no exponential expansion), and strftime_now returns a fixed short string. So the only residual is the single-large-emit / memory-not-bounded-by-instruction-count case, which the PR and issue #129 explicitly and correctly scope out as a known limitation. I concur with that scoping — it is not an oversight, and closing it would require an output-size cap (a separate, larger change). The instruction-count DoS vector this PR targets is fully closed.

Findings

LOW (optional, non-blocking): the test asserts via substring on the Display chain rather than downcasting to minijinja::Error and matching ErrorKind::OutOfFuel. Pragmatically fine — "fuel" appears in no other minijinja error message — so not worth changing.

No code changes required.

Inspect the underlying `minijinja::Error` kind through the anyhow context chain (`downcast_ref` + `ErrorKind::OutOfFuel`) instead of substring-matching the rendered message, so the regression test pins the exact failure contract rather than its display text.

inureyes added type:security Security vulnerability or fix priority:low Low priority status:review Under review labels May 30, 2026

inureyes added status:done Completed and removed status:review Under review labels May 30, 2026

inureyes merged commit 179d6f9 into main May 30, 2026
4 checks passed

inureyes deleted the fix/issue-129-bound-chat-template-fuel branch May 30, 2026 02:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(security): bound chat-template rendering with minijinja fuel#139

fix(security): bound chat-template rendering with minijinja fuel#139
inureyes merged 2 commits into
mainfrom
fix/issue-129-bound-chat-template-fuel

inureyes commented May 30, 2026

Uh oh!

inureyes commented May 30, 2026

Uh oh!

inureyes commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

inureyes commented May 30, 2026

Summary

Changes

Safety / no panic

Budget rationale

Verification

Uh oh!

inureyes commented May 30, 2026

Implementation Review Summary

Intent

Findings Addressed

Remaining Items

Verification

Uh oh!

inureyes commented May 30, 2026

Security & performance review — PR #139 (fuel-bounded chat-template rendering)

Mitigation correctness — verified

Concurrency — confirmed safe (as the PR states)

Budget magnitude — appropriate

Change footprint — negligible

Test quality — good

Residual DoS surface — matches the PR's documented scoping (no action needed)

Findings

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant