fix(security): bound chat-template rendering with minijinja fuel#139
Conversation
Model-supplied chat templates are rendered through minijinja, both per request and at model-load time via the `supports_tools` probe. Rendering was previously unbounded, so a pathological template (e.g. deeply nested or effectively unbounded `{% for %}` loops) could consume unbounded CPU/memory — a denial-of-service vector. This is RCE-safe (unlike CVE-2026-5760) but warrants defense-in-depth, especially for any multi-tenant deployment where untrusted parties could cause arbitrary models to be loaded.
Enable the minijinja `fuel` feature and cap each render at 50M VM instructions via `set_fuel` in the shared `configure_environment`, covering every render path. Exhaustion surfaces as a clean `ErrorKind::OutOfFuel` error — both render call sites propagate it via `Result`, and the load-time probe already degrades to a string heuristic — so it never panics. The budget is deliberately generous: real templates (including tool-rich Qwen/Nemotron templates over long histories) execute well under ~1M instructions, verified by the local-model audit (91 templates, 267 scenarios, 0 failures), while an unbounded loop is bounded to a fraction of a second.
Add a regression test asserting a nested unbounded loop errors via fuel exhaustion while an identically-shaped small loop still renders.
Closes #129
Implementation Review SummaryIntent
Findings AddressedNone — no CRITICAL/HIGH/MEDIUM/LOW findings. Implementation is correct, complete, and safe as submitted. Remaining ItemsNone. Verification
Test is not a false positive: minijinja's Budget soundness: 50M VM instructions is ~50x headroom over real templates (<~1M); orchestrator's real-model audit (91 templates / 267 scenarios) showed 0 false-positive exhaustions. No regression risk to legitimate rendering. |
Security & performance review — PR #139 (fuel-bounded chat-template rendering)Reviewed the mitigation against the DoS vector it targets, plus the change's own footprint. Verified against the resolved dependency ( Mitigation correctness — verified
Concurrency — confirmed safe (as the PR states)
Budget magnitude — appropriate50,000,000 ≈ 50M VM instructions. Empirically corroborated by the real-model audit (91 templates / 267 scenarios, 0 false-positive exhaustions; real templates run well under ~1M), so legitimate headroom is ~50×. 50M is far below an isize on 32/64-bit, so the Change footprint — negligiblePer render: one Test quality — good
Residual DoS surface — matches the PR's documented scoping (no action needed)Fuel meters instruction count, not native-callback work or output size. The one place that could theoretically amplify cost per instruction is the custom-function / Findings
No code changes required. |
Inspect the underlying `minijinja::Error` kind through the anyhow context chain (`downcast_ref` + `ErrorKind::OutOfFuel`) instead of substring-matching the rendered message, so the regression test pins the exact failure contract rather than its display text.
Summary
Model-supplied chat templates are rendered through
minijinja(src/server/chat_template.rs), both per request and at model-load time via thesupports_toolsprobe. Rendering was previously unbounded, so a pathological template — e.g. deeply nested or effectively unbounded{% for %}loops — could consume unbounded CPU/memory, a denial-of-service vector. As established in the companion analysis (#128), this path is not RCE-vulnerable (unlike CVE-2026-5760), but bounding render cost is sound defense-in-depth and becomes materially important in any multi-tenant deployment where untrusted parties could cause arbitrary models to be loaded.Changes
Cargo.toml— enable the minijinjafuelcargo feature (alongside the existingloop_controls).configure_environment— callenv.set_fuel(Some(50_000_000)). Because bothapply*render paths and the load-timesupports_toolsprobe funnel through this single function, one budget covers every render path. The chosen budget and its rationale are documented inline.test_pathological_template_is_bounded_by_fuelasserts a nested unbounded loop errors viaErrorKind::OutOfFuel("engine ran out of fuel") while an identically-shaped small loop still renders, isolating fuel exhaustion as the cause.Safety / no panic
Fuel exhaustion returns a
minijinja::Error, which both render call sites propagate viaanyhow::Result(.with_context("Failed to render chat template")), and the load-time probe already degrades to a string heuristic on render failure. No path panics.Budget rationale
50M VM instructions is deliberately generous. Real templates — including tool-rich Qwen/Nemotron-style templates rendered over long multi-turn histories — execute well under ~1M instructions, while an unbounded loop blows past 50M almost instantly, bounding a malicious render to a fraction of a second of CPU. Note that fuel bounds instruction count, not output size, so it does not by itself cap memory from a single very large emit; that is out of scope here.
Verification
cargo test --features metal,accelerate --lib server::chat_template— 71 passed, 0 failed.test_all_local_model_templates_render(--ignored) over the localmodels/directory — 91 templates checked, 267 scenarios passed, 0 failures (6 intentionalraise_exceptionrejections, unrelated to fuel). Confirms no false-positive fuel exhaustion on real templates.cargo clippy --features metal,accelerate --lib --tests -- -D warnings— clean.cargo fmt --check— clean.Closes #129