Skip to content

feature: improve provenance and make q2-preview editable#231

Draft
gordonwoodhull wants to merge 67 commits into
mainfrom
feature/provenance
Draft

feature: improve provenance and make q2-preview editable#231
gordonwoodhull wants to merge 67 commits into
mainfrom
feature/provenance

Conversation

@gordonwoodhull
Copy link
Copy Markdown
Member

@gordonwoodhull gordonwoodhull commented May 22, 2026

Draft PR for CI, not working yet.

The provenance epic is Plans 3-8 of the q2-preview sequence.

Current status: plans 3-5 complete

Next up: Audit every transform that emits SourceInfo::default() (a meaningless zero-range Original) and fix it to emit correct provenance.

gordonwoodhull added a commit that referenced this pull request May 25, 2026
The hub-client-e2e.yml `paths:` filter only fires the workflow when a
commit touches `hub-client/**` or the workflow file itself. It does not
follow transitive Rust deps, so PRs that modify upstream crates the WASM
bundle depends on — `quarto-core`, `quarto-pandoc-types`, `quarto-source-map`,
`pampa`, `quarto-ast-reconcile`, `wasm-quarto-hub-client`, etc. — silently
skip e2e.

Two recent misses:

- f96f56d (Carlos, 5/22): WASM-incompatible `Instant::now()` and
  `pollster::block_on` introduced in `quarto-core` broke 8 hub-client
  WASM tests on main. e2e never ran because the change was under
  `crates/`, not `hub-client/`.
- PR #231 (feature/provenance, this branch): 57 files modified across
  `crates/` and `ts-packages/`, zero under `hub-client/`. e2e silently
  skipped on every push despite the PR materially changing the WASM
  bundle's behavior.

Fix: drop the `paths:` filter outright and match the trigger shape of
the sibling heavy workflows (`test-suite.yml`, `ts-test-suite.yml`).
Also adds a `concurrency:` block (lifted from `test-suite.yml`) so
superseded runs on a PR get cancelled in flight — keeps the runner
cost from compounding.

Closes bd-izh3. The original ask there was to add a PR trigger with a
*broader* path filter; that approach still wouldn't catch the upstream-
crate case, so we go the coarser route the issue's spirit calls for.
The runner-sizing open question in bd-izh3 is also resolved — ae8274a
confirmed `ubuntu-latest` (2 cores, 2 Playwright workers) handles the
full suite in 5.3-8.1 min.

`kyoto` deliberately omitted from the branch list: `origin/kyoto` last
moved 2026-02-02 and is 825 commits behind main; the sibling workflows
still reference it but that's cargo-cult.
gordonwoodhull added a commit that referenced this pull request May 25, 2026
…nce)

bd-izh3 closed by 016894a on feature/provenance (PR #231). The patch
drops the hub-client-e2e.yml path filter outright so the workflow fires
on every PR like the sibling heavy workflows — strictly broader than
the original 'add PR trigger with broader filter' proposal, since path
filters can never follow transitive Rust deps.

Incidental: bd-cxara has its 'source_repo_path' field stripped (was a
stale absolute path from shikokuchuo's local clone; harmless flush).
Audit and revise Plans 3-8 of the q2-preview series (now framed
internally as the provenance epic) after a design discussion that
followed the q2-preview pipeline and attribution work landing on main.

Major design changes folded into the plans:

- **Plan 4 unified Generated variant.** Collapse the earlier
  `Synthetic` + `Derived` split into one `Generated { by, anchors: Vec<Anchor> }`
  shape. Atomicity is per-`by.kind` (orthogonal to anchors); the
  invocation source byte range is the first anchor with role
  `AnchorRole::Invocation`. One wire-format code (4) instead of two.

- **Plan 4/5/6 typed anchors (Path C).** Instead of stuffing
  source-info chain metadata into `by.data` (dynamic JSON), the chain
  is a typed `Vec<Anchor>` where each `Anchor` carries an `Arc<SourceInfo>`
  and a role-labeled `AnchorRole` (`Invocation`, `ValueSource`,
  `Other(String)`). `by.data` shrinks to per-kind non-source-info
  configuration. Two future-anchor roles flagged as follow-ups
  contingent on metadata-loader and Lua-file-registration work.

- **Plan 6 uniform shortcode anchor stamping.** Single funnel covers
  Rust built-ins, Lua-loaded extension handlers, and user-extension
  shortcodes uniformly via a post-walk `stamp_shortcode_anchors` helper.
  Enrichment-via-post-walk preserves Lua-attached `by.data` fields
  (lua_path, lua_line) while promoting `by.kind` to `shortcode`.
  Attribution interaction documented: multi-author shortcodes get
  latest-wins via the existing `query_byte_range` max-time logic
  composed with chain-walking through the `Invocation` anchor.

- **Plan 5 latent code-3 bug now reachable.** Plans 1-2 shipped the
  q2-preview pipeline that runs filters whose output crosses the JSON
  boundary; the FilterProvenance code-3 round-trip bug is no longer
  latent in production. Added end-to-end production-reachability
  regression test using the `{{< kbd Ctrl+C >}}` fixture (kbd.lua
  constructs a Span that gets FilterProvenance-tagged and then
  shortcode-stamped). Drops code 5 from the design.

- **Plan 7 SPA edit-back in scope.** The new q2 preview CLI command
  serves a separate SPA from ts-packages/preview-renderer; both
  hub-client and the SPA share the writer machinery via @quarto/preview-runtime.
  Plan 7 now covers replacing `noopSetAst` in the SPA with a real
  handler that routes through `incrementalWriteQmd` to
  `syncClient.updateFileContent` and the ephemeral hub's automerge↔disk
  bridge. Adds a small SPA-local `DiagnosticStrip` for Q-3-42/Q-3-43;
  hub-client's existing diagnostics-banner handles the same warnings
  there. Single-file mode (bd-tnm3k) works through the same automerge
  stack — no special case.

- **Plan 8 wrapper stays Original.** Explicit reasoning added for
  why `CustomNode("IncludeExpansion")` uses Original source_info
  (CustomNode.type_name carries generator identity; the wrapper
  substitutes 1:1 for the source-mapped Paragraph). HTML pipeline
  resolve transform in the Normalization Phase (symmetric with
  CalloutResolveTransform); HTML doesn't attribute the include line
  because there's no DOM anchor for it — accepted v1 behavior.

Mechanical changes also folded in:

- Rename `Synthetic` → `Generated` throughout the type vocabulary in
  all plans.
- Update JS-side hand-mirror file paths (`hub-client/src/utils/...`
  → `ts-packages/preview-renderer/src/utils/...`) to reflect the
  Phase-D package split.
- Each plan's intro reframed as part of the provenance epic; file
  names keep the q2-preview-plan-N form for continuity.

File renames for clarity about which filters each plan covers:

- `…plan-3-filter-idempotence.md` → `…plan-3-builtin-filter-idempotence.md`
- `…plan-7a-filter-idempotence.md` → `…plan-7a-user-filter-idempotence.md`

Plans 3-8 remain in design state on this branch; no code changes yet.
Audit pass over the provenance epic's idempotence story, scoping Plan 3
to pipeline non-determinism only and propagating the consequences to the
neighbouring plans.

Plan 3 (builtin transform and filter idempotence):

- Retitle to "Built-in transform and filter idempotence verification" —
  symmetric across Rust transforms and Lua filters (prior framing was
  too narrow).
- Enumerate the actual universe under test: 36 Rust transforms in
  build_q2_preview_transform_pipeline (4 excluded, named with reasons),
  ~20 stage-level items in build_q2_preview_pipeline_stages, and the
  one Lua filter under resources/extensions/ (video-filter.lua). The
  prior "~10-20 filters" estimate misread shortcodes as filters.
- Drop the "Plan 3 strengthening" round-trip amendment that was added
  alongside Plan 7a in commit 2129d35. Round-trip non-idempotence is
  not exercised by today's pipeline; CI-time round-trip testing
  conflates writer-lossiness with filter-non-idempotence; 7a's runtime
  check is the better home for the property when Plan 7's writer
  ships. Trim "Two flavors" section to a pointer at 7a.
- Add compute_meta_hash_fresh / compute_meta_hash_fresh_excluding_rendered
  as a new helper in quarto-ast-reconcile, parallel to the existing
  block hasher. Hash covers blocks + meta (excluding rendered.*).
- Rewrite test pseudocode against the real run_pipeline API at
  pipeline.rs:626.
- Add fixture-format constraint: no executable engine cells (CI has
  no kernels).
- Coverage gap audit: ~25 fixtures across the document-level, Lua
  shortcode, website-project, attribution, and resource categories.
  Includes lua-shortcode-version, lua-shortcode-lipsum-fixed (non-random
  path), and video-filter-header for the one built-in Lua filter.
- Convert to a development-plan format with a seven-phase work-items
  checklist.
- Close the engine-staleness open question via filter.rs:158 (fresh
  Lua::new() per invocation).
- Clarify the lua-filter-pipeline reference as TypeScript Quarto
  porting material, not the Rust inventory.

Plan 6 (provenance audit):

- Add a §Test plan bullet for source_info determinism: Plan 3's hashes
  exclude source_info by design, so a per-fixture source_info-equality
  check is Plan 6's own responsibility.

Plan 7 (incremental writer):

- Add a writer-lossless baseline test as the first §Test plan bullet,
  prerequisite for the reconciler tests. Reuses Plan 3's fixture set.
- Add Plan 3 to §References and §Dependencies (soft-depends-on via
  compute_meta_hash_fresh).

Plan 7a (runtime user-filter idempotence):

- Remove all references to the now-deleted "Plan 3 strengthening"
  section (five locations including a full subsection).
- Reframe the out-of-scope bullet from "Strengthening Plan 3" to
  "Extending the runtime round-trip check to built-in filters," with
  three-point v1-acceptance reasoning in §Notes.
- Update §Design decisions, §Dependencies, and §References to reflect
  the new shape and the shared compute_meta_hash_fresh helper.
- Add the meta-hash comparison to step 4 of the round-trip check.

No code changes; design state only.
…ailure policy

Hash helper: `merge_op` participates (verified `MergeOp::default() =
Concat` is a stable compile-time constant); `Map` entries hashed in
insertion order, no sort (an idempotence test should *catch* the kind
of HashMap-iteration-order non-determinism a sort would mask). Adds
regression-guard unit tests for both choices.

Test runner: drives every fixture through both `DriveMode::SingleFile`
(direct `run_pipeline`) and `DriveMode::ProjectOrchestrator`
(`ProjectPipeline<RenderToPreviewAstRenderer>`) so orchestrator-only
non-determinism (project discovery, ProjectIndex assembly, file-iteration
order) is also under test. Website/chrome fixtures are
orchestrator-only by design.

Failure policy: failing fixtures stay **failing** — no auto-`#[ignore]`.
Each failure files a beads issue whose description doubles as a
sub-agent investigation prompt. The integration branch holds the
queue; merge to main waits until drained or the user explicitly opts
to ignore.

New helper `find_first_divergence` (alongside the hashers) returns
`DivergencePoint::{Block { index }, MetaKey { path }, None}` so the
test driver's panic message — and therefore the sub-agent prompt —
arrives with a concrete starting point instead of just "hash diverged."

Orchestrator-mode `DocumentAst` extraction: researched the data flow;
the typed AST is materialized inside `render_qmd_to_preview_ast` but
discarded after JSON serialization. Plan recommends adding `pub ast:
DocumentAst` to `PreviewAstOutput` and forwarding through
`WasmPassTwoOutput`; alternatives (JSON re-parse, test-only hook)
documented with their costs.

Fixture rules: no absolute process paths in fixture content (built-in
extensions extract to a `temp_dir` whose path differs across CI runs;
stable within a single process — fine for two-runs-compare, but a
latent issue for future stored-snapshot variants).

Smaller corrections: `Format::from_format_string("q2-preview")` (no
`Format::q2_preview()` constructor exists); `apply_lua_filter`
(singular) is the per-filter Lua-state-creation site, with the plural
loop calling it once per filter; `LuaShortcodeEngine::new` is the
shortcode-side analogue; `quarto/video` filter extension is built-in
via `include_dir!(resources/extensions)` and auto-discovered by
`StageContext::new`, so fixtures need no scaffolding beyond `filters:
[video]` in YAML; `meta.rendered.includes.*` is the actual path
(not `meta.includes.*`) and includes contributions from
`IncludeResolveStage`, chrome render transforms, `attribution_viewer`,
and Bootstrap/clipboard injection — all skipped by
`compute_meta_hash_fresh_excluding_rendered`.

Stage-inventory clarifications: `MathJsStage` is excluded from
q2-preview; `BootstrapJsStage` and `ClipboardJsStage` write only to
`ctx.artifacts` (not to `meta` or `blocks`), so they don't affect the
hash — but their q2-preview inclusion is questionable and is filed
separately as bd-2ag1c.

Notes for the next traversal: `CodeHighlightStage`'s native disk scan
for user grammars is OS-order-dependent (not exercised today;
fixtures don't supply user grammars); lipsum's module-load
`math.randomseed(os.time())` is harmless on the non-random code path
the fixture exercises but should be reverified if a future variant
routes through `math.random`.

Estimated scope: ~760 → ~980 lines.
…branch policy

Audit pass against current source. Settles every open question that
remained in the prior revision and corrects factual drift.

Reuse over rebuild
- `DriveMode::ProjectOrchestrator` now delegates to the existing
  `render_active_page_preview` helper at
  `crates/quarto-core/tests/render_page_in_project.rs:660`. No fresh
  orchestrator wiring; no `make_website_project_ctx(...)` builder.
- `DocumentAst` extraction settled on option (a): re-parse the JSON
  via `pampa::readers::json::read`. source_info round-trips but the
  hash excludes it, so no stripping pass and no production plumbing
  change is required. Earlier option (b) (typed-AST plumbing through
  `PreviewAstOutput` / `WasmPassTwoOutput`) abandoned.
- `run_orchestrator` code sample updated: real body in place of the
  prior `unimplemented!("see Open questions")` stub.

Test crate location pinned
- File: `crates/quarto-core/tests/idempotence.rs`.
- Fixtures: `crates/quarto-core/tests/fixtures/idempotence/`.
- Cargo invocation in the sub-agent prompt template updated to
  `--test idempotence`.

Long-lived branch policy made explicit
- New `## Long-lived branch policy` section at the top.
- `## Goal` clarifies that "CI-enforced" applies when the plan lands
  on `main`; until then `feature/provenance` is allowed to be red
  while the failure queue drains.
- `### Phase 5 — Failure triage` opens with the same constraint.

Factual fixes against current source
- Transform count corrected from 36 to 37; missing
  `table-bootstrap-class` added to Finalization, with a fixture
  entry in the gap audit and Phase 4 checklist.
- `Q2_PREVIEW_STAGE_EXCLUDED` corrected to list all three exclusions
  (`math-js`, `render-html-body`, `apply-template`).
- `CodeHighlightStage` user-grammar scan citation moved from
  `pipeline.rs:644-650` to
  `crates/quarto-core/src/transforms/code_highlight.rs:126-129`.
- Stale line numbers refreshed throughout (pipeline.rs 1181→1198,
  1220→1237, 379→380, 355→356, 626→627, 855→859, 663→664;
  render_page_in_project.rs 653→660; Pass2Payload::AstJson 256→254;
  stage/context.rs 220→221; ShortcodeResolveTransform::transform
  257→513 with the correct file path).
- bd-2ag1c ordering pinned: Plan 3 lands first; bd-2ag1c follows
  with Plan 3's measurements in hand.

Section rename: "Open questions for implementation" →
"Decisions (was: open questions)" + a `### CI failure policy &
sub-agent prompt template` subsection. All internal cross-refs
updated.

Estimate revised
- Scaffolding line item: ~260 → ~100 lines (reuse, not rebuild).
- `PreviewAstOutput::ast` plumbing (~20 lines) removed entirely.
- Total: ~980 → ~800 lines.
- Session count revised 2 → 2-3 with the third explicitly allocated
  to Phase 5 triage.
Adds the structural-hash infrastructure that Plan 3's q2-preview
idempotence gate (and Plan 7a's runtime user-filter check) will sit on:

- compute_meta_hash_fresh: source-info-agnostic ConfigValue hasher.
  Insertion-order Map keys (no sort, so HashMap-iteration-order bugs
  in transforms remain detectable). MergeOp participates via its
  enum discriminant. Recurses into PandocInlines/PandocBlocks via
  the existing inline/block hashers (which already exclude
  source_info).
- compute_meta_hash_fresh_excluding_rendered: same, but skips the
  top-level `rendered` map entry. The exclusion is intentionally
  not propagated into recursion: a nested `rendered` key is content.
- find_first_divergence + DivergencePoint: returns the first block
  index whose per-block fresh hash differs, or the first insertion-
  order meta key path whose subtree hash differs (with the same
  rendered.* exclusion). The plan-sketch signature took
  &DocumentAst, but quarto-ast-reconcile cannot depend on
  quarto-core; the helper takes &[Block] + &ConfigValue and the
  test driver projects from DocumentAst.
- 11 new unit tests cover: same/different content, source_info/
  key_source agnosticism, top-level rendered exclusion, nested
  rendered participation, Map insertion-order sensitivity (no-sort
  regression guard), MergeOp sensitivity; identical/Block-mismatch/
  MetaKey-path/rendered-skip divergence localization.

Verification: `cargo nextest run --workspace` — 9321 passed, 196
skipped. `cargo xtask verify --skip-hub-build` steps 1–5 green
(lint, fmt, Rust build with -D warnings, tree-sitter, Rust tests
with -D warnings). Steps 7/10 fail with the known --skip-hub-build
artifact (`wasm-quarto-hub-client` unbuilt), unrelated to these
additive Rust changes.

Refs: claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md
Adds the test driver that Phases 3-4 will hang ~25 fixtures off.
Self-contained at `crates/quarto-core/tests/idempotence.rs`.

- `DriveMode { SingleFile, ProjectOrchestrator }`. Single-file calls
  `run_pipeline` with `build_q2_preview_pipeline_stages`. Orchestrator
  drives `ProjectPipeline<RenderToPreviewAstRenderer>` via the existing
  `render_active_page_preview` body (copied inline because each
  `tests/*.rs` is its own binary).
- `Fixture { name, setup, active, modes }` + `run_fixture` runs the
  pipeline twice per (fixture, mode), hashes blocks via
  `compute_blocks_hash_fresh` and meta via
  `compute_meta_hash_fresh_excluding_rendered`, and on divergence
  panics with `find_first_divergence`'s `DivergencePoint` embedded so
  the panic message itself fills the plan's sub-agent investigation
  prompt template.
- `pandoc_to_document_ast` is the small field-shuffle that the plan
  identifies: orchestrator mode emits `Pass2Payload::AstJson`, which
  `pampa::readers::json::read` re-parses into `(Pandoc, ASTContext)`;
  the hasher only reads `ast.blocks` + `ast.meta` so the other
  `DocumentAst` fields get defaults.
- `tests/fixtures/idempotence/README.md` documents the fixture-format
  rules (no engine cells, no absolute paths, per-fixture mode mapping).
- `smoke_plain_paragraph` smoke fixture drives a single-paragraph
  document through both modes. Passing this proves the harness works
  end-to-end before Phases 3-4 land the real fixtures.

Verification: `cargo nextest run -p quarto-core --test idempotence`
runs the new smoke test (PASS). `cargo xtask verify
--skip-hub-build --skip-hub-tests` steps 1-9 green; the Phase-1
idempotence tests and this Phase-2 smoke test ran inside Step 5.
Step 10 (preview-renderer integration tests in
`ts-packages/preview-renderer/`) fails with the same WASM-import
artifact as Step 7 — both depend on `wasm-quarto-hub-client` which
`--skip-hub-build` skips. Unrelated to these Rust-only additions.

Refs: claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md
Adds the existing-fixture batch the plan calls "carry-forward from
prior plan draft": one fixture per Rust transform / feature that was
already exercised in earlier idempotence drafts, scoped to single-file
document fixtures that run in both DriveMode variants.

Coverage:
- meta-single, meta-markdown — shortcode-resolve + metadata-normalize
  (string and PandocInlines branches).
- include-trivial — include-expansion stage + shortcode-resolve.
- callout-warning — CalloutTransform (callout-resolve is excluded
  from q2-preview, so the CustomNode survives).
- theorem — TheoremSugarTransform.
- figure-ref-target — FloatRefTargetSugarTransform.
- crossref-to-theorem — crossref-index + crossref-resolve.
- sectionize-multi — SectionizeTransform across nested headers.
- footnotes-mixed — FootnotesTransform on inline + reference forms.
- appendix-license — AppendixStructureTransform with license/
  copyright meta and a footnote interaction.
- combined-stress — sectionize + callouts + shortcodes interacting.

A `doc_fixture(name, content)` helper collapses each single-file
fixture to a one-liner; `include-trivial` keeps an inline closure
because it writes two files.

All 12 idempotence tests (smoke + 11 new) pass:
  `cargo nextest run -p quarto-core --test idempotence` → 12 passed.

No queue entries for Phase 5 from this batch — the carry-forward
fixtures are all clean on first run.

Refs: claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md
npm install (from repo root) and npm run build:wasm (from hub-client)
updated package-lock.json and crates/wasm-quarto-hub-client/Cargo.lock
on this branch. Committed so subsequent fresh checkouts of
feature/provenance can build WASM from the same dependency set.
Adds the batch of Phase-4 fixtures that need no scaffolding beyond a
single-file `setup`. Per the long-lived-integration-branch policy,
fixtures that surface non-idempotence stay in the suite as the
triage queue.

Pass on first run (both DriveModes):
- code-block-fenced — code-block-generate / -render / code-highlight.
- proof — ProofSugarTransform.
- equation-labeled — EquationLabelTransform + crossref-resolve (eq).
- toc-on — toc-generate, toc-render.
- video-filter-header — built-in Lua filter under
  `resources/extensions/quarto/video/`.
- theme-bootstrap — compile-theme-css stage.
- table-bootstrap-class — TableBootstrapClassTransform.
- lua-shortcode-version — Lua-loaded shortcode handler (returns
  `quarto.version`).

In the queue:
- **lua-shortcode-lipsum-fixed**: `SingleFile` passes; the pipeline
  itself is idempotent. `ProjectOrchestrator` panics with
  `MalformedSourceInfoPool` re-parsing the AST JSON the orchestrator
  emitted. This is a JSON writer/reader round-trip bug specific to
  lipsum-shortcode-generated inlines, not a transform-determinism
  finding. Filed as **bd-3odjm**. The test stays red per the plan's
  "do not #[ignore]" rule; the integration branch is allowed to
  carry the failure until the queue is drained.

Verification: `cargo nextest run -p quarto-core --test idempotence`
→ 20 passed, 1 failed (bd-3odjm). Plan-1 unit tests and Phase-3
fixtures all green.

Refs:
- claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md
- bd-3odjm
Both pass on first run in both DriveMode variants.

- include-in-header writes a tiny header.html and references it
  from front matter; exercises IncludeResolveStage.
- resource-image writes a 67-byte minimal PNG and references it via
  inline image syntax; exercises ResourceCollectorTransform.

Adds a write_bytes helper for the binary stub. Per the fixtures
README rule the PNG sits at the project root and is referenced
relatively (`./local.png`).

Verification: `cargo nextest run -p quarto-core --test idempotence`
→ 22 passed, 1 failed (bd-3odjm).
Three orchestrator-only website fixtures. Two pass, one in queue.

Pass:
- website-chrome — navbar + sidebar + page-navigation + page-footer
  + favicon + bootstrap-icons + canonical-url + title-prefix. Two
  pages (index, other), tiny favicon stub.
- website-listing — listing with categories enabled and feed: true,
  two posts under posts/, each with categories. Exercises
  listing-generate / -render, categories-sidebar, listing-feed-link,
  listing-feed-stage, listing-item-info.

In the queue:
- website-links — internal cross-page `.qmd` body links. Filed as
  bd-rz2we. Block 0 hash diverges across runs while meta hash is
  stable, so the divergence is genuinely in the AST blocks (not in
  rendered chrome). Hypothesis: link-rewrite or link-resolution is
  capturing the absolute project root (or canonicalized tempdir
  path) into the AST when it should emit a path-independent
  relative URL.

Verification: `cargo nextest run -p quarto-core --test idempotence`
→ 24 passed, 2 failed (bd-3odjm, bd-rz2we).

Refs:
- claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md
- bd-rz2we
Extends Fixture with an optional attribution_json: Option<&'static str>.
When present:
- SingleFile installs PreBuiltAttributionProvider on
  RenderContext.attribution_provider before run_pipeline.
- ProjectOrchestrator forwards the JSON via
  RenderToPreviewAstRenderer::with_attribution; the renderer
  installs the same provider type on the per-page RenderContext it
  constructs internally.

Stub JSON has one actor + one run covering bytes 0..1024 (a wider
range than the fixture body actually uses) so the attribution map
overlaps the entire document and AttributionGenerateStage +
AttributionRenderTransform have something to write into the AST.

`cargo nextest run -p quarto-core --test idempotence` → 25 passed,
2 failed (bd-3odjm, bd-rz2we — both pre-existing). attribution_basic
passes on first run in both DriveModes, so the deterministic
provider + generate + render stack is genuinely idempotent.

This completes the Phase 4 fixture set. The Plan-3 gate now covers:
- 1 smoke fixture
- 11 carry-forward (Phase 3, all green)
- 9 Phase-4a doc fixtures (8 green, 1 in queue)
- 2 Phase-4b multi-file (both green)
- 3 Phase-4c website (2 green, 1 in queue)
- 1 Phase-4d attribution (green)

Total: 27 fixtures, 25 green, 2 in queue.

Refs:
- claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md
- bd-3odjm (Plan 5 will fix), bd-rz2we
Adds claude-notes/instructions/idempotence-contract.md — the
author-facing summary of the contract Plan 3 enforces. Covers:
- what the hash includes and excludes (source-info blind,
  insertion-order maps, merge_op participates, rendered.* excluded
  at top level only);
- what new transforms must NOT do (undefined iteration order,
  process-local state, absolute paths, engine cells);
- the fresh-Lua-state-per-run rule for Lua filters / shortcodes;
- how to add a fixture (doc_fixture for trivial, inline closure for
  multi-file, ORCHESTRATOR_ONLY for chrome, attribution_json for
  attribution exercises);
- the long-lived-integration-branch policy: don't #[ignore] a
  failing fixture without explicit user approval.

Cross-linked from:
- crates/quarto-core/tests/fixtures/idempotence/README.md
  (existing pointer expanded to point at the contract doc and the
  plan).
- claude-notes/plans/2026-05-04-q2-preview-plan-7a-user-filter-idempotence.md
  (References section — authors looking at the runtime user-filter
  check find the CI contract too).

Refs: claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md
cargo nextest run --workspace: 9346/9348 pass. The 2 failures are
the documented queue items (bd-3odjm, bd-rz2we); every other
workspace test is green, including the 25 passing idempotence
fixtures.

cargo xtask verify (full WASM stack): Steps 1-4 green; Step 5
fails on the same 2 fixtures. That's the expected long-lived-
integration-branch state per the plan's §Long-lived branch policy —
the gate is allowed to be red until the queue is drained.

Plan 3 is complete as a deliverable: gate + hashing infrastructure
+ 27 fixtures + author-facing docs + filed queue. Merge to main
gated on draining the queue (bd-3odjm via Plan 5; bd-rz2we via a
follow-up).

Refs: claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md
The Work-items section under Phase 1-7 was fully checked, but the
parallel "Coverage gaps to address during implementation" inventory
(per-fixture bullets, line ~560+) still showed unchecked boxes even
though every fixture in that list now ships in idempotence.rs.

Marked all 26 inventory items as landed. Annotated the two that are
in the Phase-5 triage queue (lipsum-fixed → bd-3odjm, website-links
→ bd-rz2we) so the queue state is also visible from the inventory,
not just from the Phase-5 work-items block.

Plan checklist is now fully consistent: 54 checked, 0 unchecked.
…erContext

Plan 3's website_links fixture was non-idempotent: rendered AST link
URLs captured the absolute tempdir path of the per-run TempDir,
causing block-0 hash divergence across two runs with different
tempdirs. Root cause: `ResourceResolverContext::vfs_root_mode`
played two roles via a single PathBuf — disk-write root (where
runtime.file_write puts theme CSS / copied resources) and URL
prefix (what gets embedded in HTML link/asset URLs). In production
WASM these are intentionally identical; on native they have to
diverge so writes hit a real tempdir but URLs stay path-independent.

Split the field into `{ write_root, url_root }` and add a two-arg
`vfs_root_with_url_root` constructor plus per-renderer
`with_url_root` builder. Single-arg `vfs_root(...)` constructor
preserves the WASM identity contract by construction (write_root ==
url_root). Native test helpers in tests/idempotence.rs and
tests/render_page_in_project.rs now pass
`.with_url_root("/.quarto/project-artifacts")`, so rendered URLs
embed the synthetic prefix while disk writes still land in the
tempdir.

website_links now passes; 25/26 idempotence fixtures pass. The
remaining lipsum failure is bd-3odjm (FilterProvenance wire
format), owned by Plan 5 and out of scope here. Workspace nextest:
9347/9348. cargo xtask verify (Rust leg) clean for lint/fmt/build
with -D warnings.

Plan: claude-notes/plans/2026-05-21-vfs-url-write-root-split.md
Plan 4 (SourceInfo provenance types) finalized for development:
- 7-phase work-items checklist (types → constructors → accessor updates
  → Lua serde → migration → tests → verification gate)
- field renamed `anchors` → `from` (typed `SmallVec<[Anchor; 1]>` from
  day 1; serde feature required on smallvec)
- accessor semantics for `Generated` pinned: length/start_offset/
  end_offset → 0, map_offset → None, resolve_byte_range /
  remap_file_ids / extract_file_id delegate to invocation_anchor
- required-Invocation-anchor invariant on `shortcode` kind documented
  with `By::shortcode` doc-comment requirement; enforcement split
  across Plan 6 audit test and Plan 7 debug_assert
- Lua-table discriminant pinned to `t = "Generated"`
- §Test plan and Phase 6 expanded to cover every accessor + mutator
  + the `combine()` × Generated corner
- migration scope corrected (15 files, 27 occurrences); references
  and line ranges verified against the worktree source
- §Open questions section removed (no open questions remain)

Cross-plan `from` rename swept across Plans 3, 5, 6, 7, 8.

Plan 5 JSON wire format (option D):
- outer JSON key `anchors` → `from` (matches Rust field name)
- inner anchor pool reference `from` → `si_id` (distinctive; avoids
  the `parent_id` tree-structure mental model that fits Substring's
  chain but not anchor references)
- Reader/writer code samples updated; TS-side `SourceInfoEntry`
  shape note updated

Plan 6 + Plan 7 hand-offs for the required-anchor invariant added.
Deferred follow-ups (Dispatch anchor, ValueSource anchor) cross-
referenced as bd-36fr9 and bd-129m3 (committed separately to main).
Plan 4 work happens on top of an integration branch carrying exactly
one failing test (lua_shortcode_lipsum_fixed orchestrator mode,
filed as bd-3odjm). That test's root cause is the wire-format
code-3 collision Plan 5 owns, so Plan 4 must not try to fix it
locally.

Plan 4:
- New §"Inherited pre-existing failure (bd-3odjm)" section between
  Out of scope and Work items. Explains the test, the panic shape,
  the root cause, and that any *other* failure in the idempotence
  suite is a Plan-4 regression.
- Phase 7 verification gate updated: cargo nextest expects exactly
  one failure (bd-3odjm); cargo xtask verify trips on the same one.

Plan 5:
- New §"Inherited failure that must close on Plan 5's first reader
  change (bd-3odjm)" section. Spells out the contract: Plan 5's
  first reader change must turn lua_shortcode_lipsum_fixed green.
  If it doesn't, the Plan-5 author has an immediate signal that
  either the reader discrimination is wrong or the lipsum path
  produces a code-3 shape neither arm handles — stop and focus on
  it before moving on.
- Test plan now cites bd-3odjm as the live first-iteration smoke
  check, ahead of the hand-constructed tests.

Both plans now read consistently with the state of feature/provenance.
Plan 4 committed `from: SmallVec<[Anchor; 1]>` as the field type, but
Plan 5's reader/writer + Plan 6's stamper code samples still used the
`vec![]` macro to construct it. Those samples would not compile if
taken literally — `vec!` produces a `Vec`, not a `SmallVec`. Switch
to `smallvec![]` everywhere `Generated.from` is constructed:

- Plan 5: 4 occurrences (legacy-Transformed code-3 reader; Anchor
  dedup test description; forward-compat test description; round-
  trip test description).
- Plan 6: 14 occurrences across §"Per-transform fixes",
  §"Lua-shortcode enrichment", §"The post-walk helper",
  §"Variant semantics summary" etc.

No semantic change — same constructions, just the macro that
actually returns the field type.
Plan 4 + Plan 5: change Generated.from's inline capacity from
SmallVec<[Anchor; 1]> to SmallVec<[Anchor; 2]> so the steady-state
post-follow-up shape (Invocation + ValueSource on meta/var; Invocation
+ Dispatch on Lua-handler shortcodes) stays heap-free. Cost is +16
bytes per empty Generated; saves a heap allocation on every
multi-anchor shortcode resolution.

Also folds in research findings that were tacit in the previous draft:

- Phase 1 smallvec line: replace "or verify present" hedge with the
  concrete two-file Cargo.toml edit (workspace + quarto-source-map),
  noting verified-absent.
- skip_serializing_if path: use the fully-qualified
  serde_json::Value::is_null (the short form is a frequent gotcha).
- By::raw policy: accept-all; forgery caught by Plan 6 audit + Plan 7
  debug_assert, not by constructor rejection.
- Anchor ordering: append order, stable across serde, at most one
  anchor per known role.
- extract_file_id: empty-from Generated returns None, matching
  FilterProvenance's behavior; both call sites in to_ariadne_report
  already tolerate None. Stays a private fn on DiagnosticMessage.
- Lua serde Concat recursion: legacy "FilterProvenance" inside a
  Concat piece is handled automatically; no .snap/.json fixtures
  contain the legacy tag.
- Default risk: no struct holding SourceInfo derives Default in
  quarto-pandoc-types; Default for SourceInfo itself stays unchanged.
- combine() × Generated: verified unreachable today (all 17 call
  sites combine Original/Substring shapes); the Phase 6 test
  documents intent for any future caller.
- PartialEq: no production call site compares SourceInfo today; the
  derive is required by Block/Inline but not load-bearing.
The previous "+16 bytes per Generated" note understated the cost by
~2.5x. Actual delta:

- Anchor = AnchorRole (32 bytes — String-bearing Other variant
  dominates) + Arc<SourceInfo> (8) = ~40 bytes.
- SmallVec<[Anchor; 1]> ≈ 48 bytes; SmallVec<[Anchor; 2]> ≈ 88 bytes
  on the stack — a 40-byte delta per SmallVec field.
- Since SourceInfo is an enum, its stack size is dictated by the
  largest variant, so every SourceInfo (Original/Substring/Concat
  too) grows by 40 bytes — not just Generated instances. Block/Inline
  carry SourceInfo by value, so the cost multiplies across the AST
  (tens-to-hundreds of KB on a large doc).

Plan keeps cap=2 — the trade is still defensible — but documents the
real cost honestly and notes Arc-boxing Generated as the next lever
if memory-per-node ever bites the q2-preview editor.
Review of Plan 5 against `feature/provenance` @ 5bea4d0. The plan was
implementable but underspecified for an unsupervised pass; this commit
folds in the gaps surfaced by the review.

Plan structure
- Phase-ordered Work items checklist (Phase 0 start gate → Phase 7
  verification gate). Phase 1 lands the bd-3odjm fix on its own; the
  rest of the phases compile cleanly between steps.
- Phase 2 corrected: Plan 4's interim writer arm keeps
  `SerializableSourceMapping::FilterProvenance` and routes Generated
  through it via `by.as_filter().expect(...)`. Plan 5 Phase 2 removes
  both the variant and the interim arm together — the workspace stays
  buildable across the handoff.

Wire-format clarifications
- §"TypeScript wire-format definitions" added with explicit before/after.
  Code 4's `d` becomes `{ by; from? }`; code 5 (Synthetic/Derived churn)
  is removed entirely; code 3's `d` widens to a union for the dual-shape
  legacy reader.
- Renamed `anchor_pool_ids` → `from` in writer pseudocode so the field
  name is consistent across the user-facing, writer-internal, and wire
  layers. Three-forms-one-name callout added to Design decisions.
- Verbose-key trade-off justified explicitly.

Implementation-precision adds
- `arc_parent_ids` cache reuse for anchor dedup (same key shape as
  `Substring.parent`).
- Topological intern order pinned: anchors interned before their parent
  Generated entry (mirrors today's Concat/Substring arms; reader's
  `si_id < current_index` guard requires it).
- `r: [0, 0]` rule made explicit at the writer-side intern tuple, not
  just in JSON examples.
- Anchor dedup test clarified as hand-constructed (does not depend on
  Plan 6 shipping the resolver).
- Streaming-writer test coverage explicit (Phase 6).
- bd-3odjm reopen policy: new failure modes file fresh issues.

Fixes
- Risk areas: the streaming-writer paragraph named the wrong functions
  (`write_custom_block` etc. handle CustomNodes, not the pool). Now
  names `to_json` + `stream_write_source_info_pool` correctly.
- References: line numbers refreshed against the current branch; added
  pointer to `stream_write_source_info_pool`.
- `SmallVec::<[Anchor; 1]>::new()` in reader pseudocode bumped to cap=2
  to match Plan 4's recent capacity bump.
Folds in research findings from Plan 4's pre-execution review:

* Adds Phase 3 work to introduce SourceInfo::root_file_id() and
  SourceInfo::collect_file_ids() accessors, retiring six ad-hoc
  walkers across diagnostic.rs, location.rs, pipe_table.rs,
  section.rs, apply_template.rs, engine_execution.rs. Also fixes a
  latent nested-Substring bug in pipe_table.rs / section.rs that
  silently fell back to FileId(0).

* Drops the deprecated SourceInfo::filter_provenance alias from
  Phase 5 — only 4 callers exist, all migrated inline in one PR.

* Specifies the JSON writer's transitional Generated arm so the
  writer stays exhaustive while emitting the legacy code-3 payload
  until Plan 5 takes over wire-code 4.

* Adds the test-pattern guard-form template for the empty-bind
  pattern-match sites in lua/diagnostics.rs and filter_tests.rs.

* Updates §Risk areas, §Estimated scope, §References, §Dependencies,
  and Phase 7 verification gate to reflect the consolidation.

Downstream plans (5, 6, 7, 7a, 8) reference only the published
Plan 4 surface (Generated, By, invocation_anchor, is_atomic_kind);
none touched.

Status line drops "(open questions named)" — all resolved.
Folds review findings into the Plan-5 doc:
- Phase-ordered Work items checklist (Phase 0 start gate → Phase 7
  verification gate).
- TypeScript wire-format definitions §section with explicit before/after
  (code 4 narrows to `{by; from?}`, code 5 removed, code 3 widens to
  the dual-shape union).
- Phase 2 corrected to match Plan 4's interim writer arm: Plan 5 Phase 2
  removes both `SerializableSourceMapping::FilterProvenance` and Plan 4's
  `by.as_filter().expect(...)` arm together.
- Three-forms-one-name callout for `from` across user-facing /
  writer-internal / wire layers.
- arc_parent_ids cache reuse and topological intern order pinned
  explicitly in Phase 2.
- Anchor dedup test clarified as hand-constructed (no Plan 6 dependency).
- Streaming-writer test coverage made explicit.
- bd-3odjm reopen policy: new failure modes file fresh issues.
- Risk areas: fixed wrong function names (was naming CustomNode emit
  paths, not pool emit paths).
- References: line numbers refreshed against current branch.
Adds `SourceInfo::Generated { by: By, from: SmallVec<[Anchor; 2]> }` as
the unified provenance variant covering filter constructions, shortcode
resolutions, sectionize/footnotes/appendix wrappers, title-block h1, and
tree-sitter postprocess spaces. Removes the `FilterProvenance` variant.

Type surface in `quarto-source-map`:
- `By { kind: String, data: serde_json::Value }` with builders for the
  ten known kinds (`filter`, `sectionize`, `user_edit`, `shortcode`,
  `include`, `title_block`, `footnotes`, `appendix`,
  `tree_sitter_postprocess`, `raw`) plus `is_atomic_kind`, `is_kind`,
  `as_filter`.
- `Anchor { role: AnchorRole, source_info: Arc<SourceInfo> }` with
  `Invocation` / `ValueSource` / `Other(String)` roles.
- `SourceInfo` accessors: `generated`, `invocation_anchor`,
  `value_source_anchor`, `anchors_with_role`, `append_anchor`,
  `root_file_id`, `collect_file_ids`.
- `resolve_byte_range` for `Generated` delegates to the first
  `Invocation` anchor and recurses; `remap_file_ids` walks every
  anchor's source_info via `Arc::make_mut`.

File-id walker consolidation:
- Six ad-hoc walkers retired (`diagnostic.rs`, `location.rs`,
  `pipe_table.rs`, `section.rs`, `apply_template.rs` test,
  `engine_execution.rs` test) onto the two new `SourceInfo` methods.
  Fixes a latent nested-Substring `FileId(0)` fall-through in
  `pipe_table.rs` and `section.rs` along the way.

Lua serde extension:
- `source_info_to_lua_table` / `_from_lua_table` gain a `Generated`
  arm with `t = "Generated"`, `by` sub-table (`data` JSON-encoded for
  Lua transit) and `from` array. The legacy `"FilterProvenance"` tag
  is still accepted by the reader and folds to `Generated { by: filter,
  from: [] }`; writers never emit it.

Migration:
- All 27 `SourceInfo::FilterProvenance` pattern sites and 4 non-source-map
  `filter_provenance(...)` constructor callers migrated. Renamed
  `test_filter_provenance_tracking` → `test_filter_generated_tracking`
  with updated assertions.
- JSON writer emits legacy code-3 from filter-kind `Generated` so
  bd-3odjm's expected failure mode is preserved until Plan 5 ships
  wire-code 4.

Verification:
- `cargo build --workspace`: clean.
- `cargo nextest run --workspace --no-fail-fast`: 9370 passed, 1 failed
  (`quarto-core::idempotence::lua_shortcode_lipsum_fixed` =
  bd-3odjm Plan-5 baseline). No other regressions.
- `cargo xtask verify --skip-rust-tests`: all 12 steps green
  (Rust build + hub-client npm install/build/WASM/tests +
  q2-preview SPA build).
- Grep gates green: zero hits for `SourceInfo::FilterProvenance`,
  `SourceInfo::filter_provenance`, or the retired walker functions.
  `"FilterProvenance"` string appears only in the legacy Lua reader
  arm.

Dependencies:
- Adds `smallvec = "1.13"` with `serde` feature to the workspace,
  consumed by `quarto-source-map` and `pampa`.
- Adds `serde_json` to `quarto-source-map`'s regular deps (was
  dev-only).

Adds ~30 unit tests in `quarto-source-map/src/source_info.rs` covering
the new By/Anchor surface, every `Generated` accessor arm, JSON
round-trips, and the `combine() × Generated` structural case. Adds a
back-compat regression test in `pampa/src/lua/diagnostics.rs` for the
legacy `"FilterProvenance"` Lua tag.

Plan: claude-notes/plans/2026-05-04-q2-preview-plan-4-source-info-types.md
Adds an "Implementation surprises" section to Plan 4 capturing the
six divergences from plan-as-written that came up during landing:

- `gen` is a reserved Rust keyword (affects Plan 7's `preimage_in`
  pseudocode too — flagged for amendment)
- Phase 1's "compiles cleanly" applies to quarto-source-map only;
  workspace stays red until Phase 5
- `extract_filename_index` was tests-only — deleted entirely rather
  than kept as a shim
- `anchors_with_role` had to use `Box<dyn Iterator>` instead of
  `impl Iterator` (mismatched concrete iterator types per arm)
- `cargo xtask verify` also dirties `crates/wasm-quarto-hub-client/Cargo.lock`
- bd-3odjm behaved exactly as the plan predicted (non-surprise worth
  recording as a positive datapoint for plan accuracy)
Replace SourceInfo::default() at each enumerated synthesizer site with
the appropriate Generated { by: By::<kind>(), from: [] } shape (or
threaded Original for theorem/proof title-attr).

- title_block.rs:create_title_header — Generated { by: title_block() }
  on both the Header and inner Str.
- pampa sectionize.rs — Generated { by: sectionize() } on the Section
  Div (both close-on-stack and end-of-input sites).
- footnotes.rs:create_footnotes_section — Generated { by: footnotes() }
  on the container Div, HorizontalRule, and OrderedList.
- appendix.rs — Generated { by: appendix() } on the container Div plus
  the four structurally-identical helpers (wrap_bibliography,
  create_license_section, create_copyright_section,
  create_citation_section) that the plan body didn't enumerate but
  which are mechanically the same fix.
- theorem.rs / proof.rs:extract_name_attr — thread &div.attr_source
  through; index before kvs.remove("name"); use
  attr_source.attributes[idx].1. Plan-suggested debug_assert_eq! is
  too strict (fires on the common AttrSourceInfo::empty() test
  pattern with non-empty kvs); relaxed to "empty OR equal" so empty
  AttrSourceInfo signals "no provenance" rather than a bug.
- pampa postprocess.rs:1348 synthetic Space —
  Generated { by: tree_sitter_postprocess() }.

All 9448 workspace tests pass.
…h anchor

Two new research plans extending the provenance epic past Plan 8.
Adopt the `provenance-plan-N-<slug>.md` naming convention (drop the
`q2-preview-` prefix); the epic has outgrown the original framing.

Plan 9 — ValueSource threading for metadata-derived content (~860 LOC)
- Phase 1: `config_value_to_inlines_with_provenance` + `DocumentProfile.title_source_info` + `AppendixSection` enum
- Phase 2: meta/var shortcode two-anchor shape (closes bd-129m3)
- Phase 3: DocumentProfile.title → nav-text ValueSource (closes bd-8pmq3)
- Phase 4: appendix per-section sub-Div ValueSource (option A; `By::appendix(AppendixSection)` typed enum)
- Phase 5: Plan-7 deferred invariant tests (preimage_in role-asymmetry; appendix-license e2e round-trip)
- Phase 6: Plan 7 §asymmetry wording cleanup

Plan 10 — Dispatch anchor + Lua source registration (~1100 LOC)
- AnchorRole::Dispatch (diagnostic-only; inherits Plan 9's `AnchorRole::Other` policy)
- SourceContext extension for Lua filter / handler files (FileIds + content)
- Lua engine bridge: thread FileId into closure context; resolve `debug.getinfo()` to typed source_info
- `By::filter` signature shrinks to nullary; path/line move to Dispatch anchor
- Lua-handler shortcode gets `from: [Invocation, Dispatch]`
- Wire-format migration with dual-reader window
- Cache-key extension; coordinates with Plan 7a's filter_sources_hash
- Closes bd-36fr9

Both marked as research plans pending API-surface finalization;
subsequent review pass converts to development plans with checklisted
phases.

No code changes.
Adds 12 new tests covering Plan 6's invariants:

Per-transform shape tests (one per fixed synthesizer):
- sectionize: synthesized Div carries Generated{by:sectionize, from:[]}
  on both close-paths; wrapped Header keeps its original source_info.
- title_block: synthesized Header + inner Str both carry
  Generated{by:title-block}.
- footnotes: synthesized container Div (and its HorizontalRule chrome)
  carry Generated{by:footnotes}.
- appendix: synthesized container Div carries Generated{by:appendix}.

Shortcode tests (shortcode_resolve.rs):
- shortcode_resolution_has_generated_with_invocation_anchor: resolved
  Str shape, including by.data.name and Invocation source.
- multi_inline_shortcode_resolution_shares_invocation_source: multi-
  inline + nested Strong[Str] all share the same Invocation source_info.
- escaped_shortcode_keeps_original_source_info: token's Original
  source_info, not Generated.
- unknown_shortcode_error_uses_token_source_info: both Strong + inner
  Str carry the token's Original.
- shortcode_resolution_required_anchor_invariant: no Generated{by:
  shortcode, from: []} survives.
- shortcode_resolution_is_deterministic: two runs produce ==-equal
  ASTs (every Generated.by, every Generated.from[], every Original).

Lua enrichment test (lua_integration sub-module, native-only):
- lua_shortcode_typed_return_enriched_to_shortcode_kind: a typed
  pandoc.Str return that started life as Generated{by:filter, ...} via
  filter_source_info gets promoted to Generated{by:shortcode, data:
  {name, lua_path, lua_line}, from:[Invocation]}.

Deferred tests documented in the plan checklist (attribution, pipeline-
level audit, include composition, writer round-trip — owned by Plans
7/8 or the e2e test crate). All 9460 workspace tests pass.
…pl checklist

Thorough rewrite incorporating all decisions from the Plan 7 review
session. Removes inconsistencies left by multiple revisions; presents
the plan as a single coherent design. Adds an 83-item implementation
checklist across 9 phases.

Key changes from the previous version:

API decomposition
- Writer is pipeline-agnostic; caller supplies baseline AST.
- `incremental_write_qmd` signature becomes
  `(original_qmd, baseline_ast_json, new_ast_json) → { qmd, warnings }`.
- No `pipeline_kind` parameter. Pipeline tier is implicit in
  whichever baseline AST the caller passes.
- Decomposition framed as parse / transform / reconcile / write
  primitives; writer is just the last one.

Coarsen pseudo-code fixes
- KeepBefore catch-all falls through to Rewrite, not Omit
  (data-loss-shaped fallback removed).
- Inline-level soft-drop substitutes via `before_idx` from the
  alignment, not anchor-matching (which would fail for user-edit
  inlines that lack Invocation anchors).
- Multi-inline dedupe equality criterion stated as PartialEq on
  the Invocation anchor's source_info.

Unified editability predicate
- `is_editable_inside(node, target)` consulted by Plan 2A
  (React-side read-only check) AND the writer's coarsen
  (soft-drop logic). Three reasons content is uneditable:
  atomic CustomNode, atomic-kind Generated, or no preimage in
  target file.
- Non-atomic synthesized containers (sectionize, footnotes,
  appendix) gain read-only treatment via the no-preimage clause.

Soft-drop catalog expansion
- Adds two new soft-drop cases for no-preimage Generated
  containers: RecurseIntoContainer and UseAfter both substitute
  Omit + Q-3-43 warning. Let-user-win stays for atomic CustomNodes
  only (they have menu-driven replacement affordance + preimage).
- Q-3-43 widens to "Generated content edit dropped" — three
  emission paths (include, metadata-derived RecurseIntoContainer,
  metadata-derived UseAfter), one diagnostic code, structured
  body that names the source.

Migration plan (new section)
- WASM signature, TS wrapper, sync-client interface, three
  consumer sites enumerated with before/after.
- q2-demos require no state changes: sync-client's `astCache`
  already maintains both `source` and `ast`; baseline supplied
  from `cached.ast`.
- All in one PR; no back-compat shim (3 in-repo consumers).

SPA integration
- Content-match echo-prevention specified: hash emitted qmd,
  compare in onFileContent, suppress local echoes without
  losing unrelated file updates.
- `pipelineKindForFormat` moves from hub-client to
  `ts-packages/preview-runtime/src/pipelineKind.ts` (the SPA
  needs it for the display path).

Tests deferred to Plan 9
- `preimage_in` role-asymmetry end-to-end test and
  appendix-license round-trip test land in Plan 9 Phase 5 (need
  a real ValueSource consumer).
- Plan 7 keeps unit-level `preimage_in` tests that don't depend
  on Plan 9's stamping.

`AnchorRole::Other` policy stated explicitly: future anchor
roles default to non-walked by the writer. Inherited by
Plans 9 and 10.

`AtomicViolation` removed entirely (soft-drop replaces it).

Stale line references corrected
(`block_source_span` 447→448; `inline_source_span` named at 800).

Earlier-draft historical framing edited out; the plan now reads
as a single coherent design.

Implementation checklist
- Phase 1: foundation primitives (quarto-source-map, quarto-core)
- Phase 2: writer internals (pampa::writers::incremental)
- Phase 3: diagnostic catalog (Q-3-42, Q-3-43)
- Phase 4: WASM bridge signature change
- Phase 5: TypeScript wrapper + sync-client interface
- Phase 6: consumer migrations (ReactPreview + 2 demos)
- Phase 7: q2-preview SPA integration
- Phase 8: end-to-end tests
- Phase 9: verification + cleanup

Estimated scope: ~1390 LOC (up from 1310; consumer migration and
SPA echo-prevention added a bit despite pipeline-plumbing
removal).

No code changes.
- cargo xtask verify: all 12 steps green (build, tests, lint, fmt,
  WASM, hub-client, q2-preview-spa). 9460 workspace tests pass.
- End-to-end exercise: target/debug/q2 render on a fixture exercising
  title-block / sectionize / footnotes / appendix / meta shortcode.
  Generated HTML inspected — synthesizers and shortcode resolver
  produce correct output. Recorded in the plan body.
- crates/wasm-quarto-hub-client/Cargo.lock picks up the new smallvec
  dep on quarto-core (transitive through the WASM build).
…rch plans

Plan 7 rewrite incorporates all decisions from the review session:

- API decomposition: writer is pipeline-agnostic; caller supplies
  baseline AST. `incremental_write_qmd` signature becomes
  `(original_qmd, baseline_ast_json, new_ast_json) → { qmd, warnings }`.
  No `pipeline_kind` parameter.
- Unified `is_editable_inside` predicate consulted by both Plan 2A
  (React-side read-only check) and the writer's coarsen (soft-drop).
  Non-atomic synthesized containers gain read-only treatment via the
  no-preimage clause.
- Soft-drop catalog expanded: no-preimage Generated containers
  soft-drop on both RecurseIntoContainer and UseAfter; Q-3-43 widens
  to "Generated content edit dropped" with three emission paths.
- Coarsen KeepBefore catch-all falls to Rewrite (not Omit) —
  data-loss-shaped fallback removed.
- Multi-inline dedupe via PartialEq on Invocation anchor source_info.
- Inline-level soft-drop substitutes via `before_idx` from the
  alignment (not anchor-matching, which would fail for user-edit
  inlines).
- AtomicViolation variant removed entirely.
- SPA integration: content-match echo-prevention; `DiagnosticStrip`
  component; `pipelineKindForFormat` moves to ts-packages/preview-runtime.
- Migration plan covers all 3 in-repo consumers (ReactPreview, kanban,
  hub-react-todo) + sync-client interface. q2-demos require no state
  changes — sync-client's astCache already holds the baseline.
- 83-item implementation checklist across 9 phases.

Plans 9 and 10 (new research plans, `provenance-plan-N-<slug>.md`
naming convention):

- Plan 9 — ValueSource threading for metadata-derived content;
  closes bd-129m3, bd-8pmq3, and the unowned appendix-license
  obligation. Owns Plan 7's deferred role-asymmetry e2e test.
- Plan 10 — Dispatch anchor + Lua source registration; closes
  bd-36fr9. Inherits Plan 9's `AnchorRole::Other` policy. `By::filter`
  signature shrinks (path/line move to Dispatch anchor).

Both Plans 9 and 10 marked as research plans; API surface settled,
implementation order to be pinned in subsequent review pass.

No code changes.
Five refinements surfaced in the post-merge "any further thoughts"
review and now folded back into the plans.

Plan 7 (q2-preview-plan-7-incremental-writer.md):

- **Scope clarification — first-demo UX**: lifting the coarse
  read-only guard exposes the writer's soft-drop warnings as the
  primary safeguard. A fine-grained React-side editability gate
  (per-region greying via a TS `is_editable_inside`-equivalent) is
  deferred to a future frontend pass. For the first demo, "you can
  type, but it doesn't take, and you see a warning" is the
  deliverable. Plan 2A's existing atomic-CustomNode gate continues
  to prevent the most surprising cases without further work.
- **Q-3-43 catalog mechanics verified**: catalog entries carry
  one static `message_template`; per-call-site body text uses the
  existing `DiagnosticMessageBuilder` builder pattern. No
  template-able-body infrastructure needed. Phase 3 ships one
  catalog entry per code + three builder helper functions.
- **Phase 9 `q2 preview` WASM rebuild chain**: explicit three-step
  refresh (`npm run build:wasm` → `cargo xtask build-q2-preview-spa`
  → `cargo build --bin q2`) added as sub-bullets. References the
  2026-05-20 stale-WASM incident and the canonical instructions in
  `CLAUDE.md` §"Verifying Rust changes in `q2 preview`".
- **Coordination posture**: the 83-item checklist is sized for
  serial implementation in a single fresh 1M-context session. No
  beads-per-phase split needed; follow-ups only for surprises.

Plan 10 (provenance-plan-10-dispatch-anchor.md):

- **Wire format: clean break, not dual reader**. Per "emphasis on
  clean design" guidance — the codebase is workspace-internal Rust
  with no on-disk artifacts holding the old shape. Phase 6 becomes
  a one-PR break (writers emit the new shape; old shape removed
  entirely). Same rationale as `By::appendix` (Plan 9) and
  `By::filter` (Phase 4 above). Removed the related open question
  about legacy-shape-decoder behavior. Estimated scope for Phase 6
  drops from ~150 to ~80 LOC.
- **Phase 7 reuses Plan 7a's `filter_sources_hash`**: Plan 7a
  lands first per user direction. Plan 10's Phase 7 reduces to a
  smoke test confirming the existing hash field invalidates on Lua
  filter file changes. Estimated scope drops from ~80 to ~30 LOC.
  Plan-7a-vs-Plan-10 coordination friction is now resolved (in
  the risk-areas section).
- **Total Plan 10 estimate drops from ~1100 to ~980 LOC** as a
  result of these two simplifications.

Plan 7a (q2-preview-plan-7a-user-filter-idempotence.md):

- **Plan 10 cross-reference refreshed**: the "Plan 4 / Plan 6's
  Dispatch follow-up" reference becomes "Plan 10
  (`claude-notes/plans/2026-05-22-provenance-plan-10-dispatch-
  anchor.md`)". Acknowledges Plan 10 now exists as a numbered plan.
- **Coordination note added**: `filter_sources_hash` is Plan 7a's
  field; Plan 10 reuses it. Migration to Dispatch-anchor-based
  per-Lua-line attribution is purely additive when Plan 10 lands.
- **Structural independence noted**: Plan 7a reads filter path
  from `FilterMetadata.spec`, not from any Generated node's
  `by.data`, so it's structurally independent of `By`'s data shape.
  Plan 10's clean break on `By::filter` doesn't affect Plan 7a's
  diagnostic emission.

No code changes.
…y, editability gate)

Phase 1 lands the writer-side primitives Plan 7's coarsen will consume.
All purely additive: no existing API changes; no Generated nodes
produced anywhere yet — coarsen still emits the pre-Plan-7 shape.

`SourceInfo::preimage_in(target: FileId) -> Option<Range<usize>>`
- Original: Some(range) iff file matches target.
- Substring: recurse parent; offsets compose additively.
- Concat: every piece must resolve into target AND be byte-contiguous;
  gappy or mixed-file Concats return None.
- Generated: walk Invocation anchor only via invocation_anchor().
  ValueSource (Plan 9), future Dispatch (Plan 10), and Other roles are
  diagnostic-only — never consulted by the writer's byte-copying path.
  Documented on both preimage_in and AnchorRole::Other doc-comments.

Atomic CustomNode registry — moved to `quarto-pandoc-types`
- Plan 7 originally placed `ATOMIC_CUSTOM_NODES` / `is_atomic_custom_node`
  in `quarto-core`, but `quarto-core` depends on `pampa` and the writer
  is the consumer — that direction would cycle. Moved down to
  `quarto-pandoc-types` (the home of `CustomNode` itself).
- Lockstep cross-check test in `quarto-core::crossref` pins
  `CROSSREF_RESOLVED_REF` against the registry literal.

Editability gate — `pampa::writers::incremental`
- `is_editable_inside_block(block, target_file_id) -> bool`
- `is_editable_inside_inline(inline, target_file_id) -> bool`
- Shared private `is_editable_inside_source_info` core.
- Three uneditable reasons: atomic CustomNode, atomic-kind Generated,
  no-preimage Generated (covers ValueSource-only Generated as a
  consequence of preimage_in's Invocation-only walk).

Reconciler source-info-blindness — extended `quarto-ast-reconcile`
- Five new tests covering the Plan-4/6 Generated shapes:
  Generated-with-different-By, Generated-with-different-anchor-lists,
  mixed Invocation/ValueSource, CustomNode wrapper Generated-vs-Original,
  CustomNode slot-child Generated-vs-Original. Existing Original-only
  blindness tests already covered the pre-Plan-6 cases.

Tests added (all green):
- 16 preimage_in tests in quarto-source-map
- 3 atomic-registry tests in quarto-pandoc-types + 1 lockstep in quarto-core
- 12 editability tests in pampa::writers::incremental
- 5 reconciler-blindness tests in quarto-ast-reconcile

Verification: `cargo nextest run --workspace` (9509 tests) and
`cargo xtask verify` (full 12-step chain including WASM build +
hub-client tests) both green.

Plan refs:
- claude-notes/plans/2026-05-04-q2-preview-plan-7-incremental-writer.md
  §"`preimage_in` semantics", §"Unified editability predicate",
  §"`is_atomic_custom_node` registry", Phase 1 checklist.

Snapshot test changes: none.
…lti-inline dedupe

Coarsen's classification rewires per Plan 7's cascade and gains soft-drop
substitutions. Bad edits no longer abort the write — they substitute a
safe alignment and push a Q-3-42 / Q-3-43 warning.

CoarsenedEntry variants (Phase 2a)
- `Transparent { child_entries }` — non-atomic Generated wrappers with
  source-bearing children. Wrapper contributes nothing; children emit
  through a recursive `emit_entries` helper that shares `prev_entry`
  state across the wrapper boundary so separators compose naturally.
- `Omit` — atomic-kind Generated with no Invocation anchor (filter
  constructions, title-block synthesis, tree-sitter postprocess space),
  and soft-drop substitutions for no-preimage Generated containers.
- `Verbatim::orig_idx` / `InlineSplice::orig_idx` widened to
  `Option<usize>` so Transparent's children opt out of the original-gap
  separator optimization (they aren't top-level blocks).

Signature changes (Phase 2b)
- `incremental_write` and `compute_incremental_edits` return
  `Result<(String, Vec<DiagnosticMessage>), Vec<DiagnosticMessage>>`.
  Warnings on Ok (soft-drop); Err reserved for genuine structural
  failures from `assemble_inline_splice`.
- `coarsen` accepts `&mut Vec<DiagnosticMessage>` warning sink.
- WASM bridge `incremental_write_qmd` threads warnings into the existing
  `AstResponse.warnings` channel via `diagnostics_to_json`.

KeepBefore cascade (Phase 2c)
- `preimage_in(target)` present → Verbatim (covers Original, Substring,
  contiguous Concat, Generated-via-Invocation).
- Atomic-kind Generated with no Invocation → Omit + debug_assert against
  shortcode-with-empty-from (Plan 6 stamper invariant).
- Non-atomic Generated with source-bearing children → Transparent
  recurse over the wrapper's children.
- Catch-all → Rewrite. Cross-file Original, gappy Concat, Generated
  wrapper without source-bearing children.

UseAfter + RecurseIntoContainer soft-drops (Phase 2d)
- UseAfter on atomic CustomNode: let-user-win Rewrite (no warning) —
  the qmd writer's CustomNode arm reads `plain_data`.
- UseAfter on no-preimage Generated: Omit + Q-3-43 — no source position
  to anchor a Rewrite at; original container regenerates next run.
- RecurseIntoContainer with `!is_editable_inside_block` → Verbatim
  wrapper bytes (if preimage) or Omit (no preimage) + Q-3-43.

Inline-level soft-drop + multi-inline dedupe (Phase 2e)
- Two-phase `assemble_inline_content`: first applies soft-drop
  substitutions (UseAfter / RecurseIntoContainer targeting a
  non-editable original inline → KeepBefore at the positional proxy
  + Q-3-42); second emits, with consecutive KeepBefore entries whose
  Invocation anchors are PartialEq-equal collapsing to a single
  emission of the anchor's preimage bytes.
- Singleton-KeepBefore inline emit path updated to use
  `preimage_in(target_file_id)` (with `inline_source_span` fallback).
  Original-SI inlines are byte-identical to the old behavior;
  Generated-SI inlines now emit the Invocation anchor's preimage
  instead of an empty range — fixes a latent zero-length bug in the
  pre-Plan-7 inline-splice path.

Diagnostic catalog (Phase 3a)
- `Q-3-42` "Shortcode edit dropped" — inline-level soft-drop on
  atomic-Generated content. Source location: the Invocation anchor's
  source_info (the token bytes).
- `Q-3-43` "Generated content edit dropped" — block-level soft-drop.
  Single catalog `message_template`; the three emission paths supply
  distinct body text via the builder API (per Plan 7 §"Catalog
  mechanics").
- Builder helpers `diagnostic_q3_42_inline(inline)` and
  `diagnostic_q3_43_block(block)` live in `pampa::writers::incremental`
  (not `quarto-error-reporting`, which doesn't depend on
  `quarto-pandoc-types`).

Tests added
- 12 Plan-7-specific coarsen unit tests in `coarsen_plan7_tests`:
  KeepBefore cascade variants (Verbatim, Omit, Transparent, Rewrite
  catch-all), UseAfter let-user-win + soft-drop, RecurseIntoContainer
  soft-drop variants, multi-inline dedupe positive / negative /
  ValueSource cross-talk, inline UseAfter Q-3-42 path.

Test-caller migrations
- All `.expect("incremental_write failed")` and `.unwrap()` callsites
  in `inline_splice_*` and `incremental_writer_tests.rs` updated to
  destructure the new `(qmd, warnings)` tuple.

Verification
- `cargo nextest run --workspace`: 9535 passed.
- `cargo xtask verify`: all 12 steps green (lint, fmt, build, Rust
  tests, hub-client WASM build + tests, trace-viewer, q2-preview-spa,
  shared packages).

Deferred follow-ups
- Writer-lossless baseline test for each Plan-6/Plan-7 Generated shape
  (needs crafted fixtures; Plan 7 checklist item).
- Soft-drop interaction test (shortcode + non-atomic edit in same Para).
- Filter-construction soft-drop-on-edit test.

Snapshot test changes: none.

Plan refs:
- claude-notes/plans/2026-05-04-q2-preview-plan-7-incremental-writer.md
  §"Coarsen pseudo-code", §"Inline-level soft-drop", §"Multi-inline
  dedupe", §"Diagnostic codes", §"Catalog mechanics".
Repo-level facts that bit during Phase 1-3 implementation, surfaced
in the Phase 4 handoff prompt and folded back into the plan so future
readers don't re-discover them:

- Phase 4 §Repo facts: wasm-quarto-hub-client is NOT in the cargo
  workspace; build via `cd hub-client && npm run build:wasm` or via
  `cargo xtask verify` step 6. `AstResponse.warnings` is
  `Option<Vec<JsonDiagnostic>>`; convert via `diagnostics_to_json`
  taking `&SourceContext` — access `ASTContext.source_context`.
- Phase 2 §Repo facts for test fixtures: `AttrSourceInfo` doesn't
  implement `Default` (use `AttrSourceInfo::empty()`). `gen` is a
  Rust 2024 reserved keyword; don't name a `SourceInfo::Generated`
  variable `gen`.

These are workspace-internal facts that survive context boundaries
and help any future implementer in this codebase.
The hub-client-e2e.yml `paths:` filter only fires the workflow when a
commit touches `hub-client/**` or the workflow file itself. It does not
follow transitive Rust deps, so PRs that modify upstream crates the WASM
bundle depends on — `quarto-core`, `quarto-pandoc-types`, `quarto-source-map`,
`pampa`, `quarto-ast-reconcile`, `wasm-quarto-hub-client`, etc. — silently
skip e2e.

Two recent misses:

- f96f56d (Carlos, 5/22): WASM-incompatible `Instant::now()` and
  `pollster::block_on` introduced in `quarto-core` broke 8 hub-client
  WASM tests on main. e2e never ran because the change was under
  `crates/`, not `hub-client/`.
- PR #231 (feature/provenance, this branch): 57 files modified across
  `crates/` and `ts-packages/`, zero under `hub-client/`. e2e silently
  skipped on every push despite the PR materially changing the WASM
  bundle's behavior.

Fix: drop the `paths:` filter outright and match the trigger shape of
the sibling heavy workflows (`test-suite.yml`, `ts-test-suite.yml`).
Also adds a `concurrency:` block (lifted from `test-suite.yml`) so
superseded runs on a PR get cancelled in flight — keeps the runner
cost from compounding.

Closes bd-izh3. The original ask there was to add a PR trigger with a
*broader* path filter; that approach still wouldn't catch the upstream-
crate case, so we go the coarser route the issue's spirit calls for.
The runner-sizing open question in bd-izh3 is also resolved — ae8274a
confirmed `ubuntu-latest` (2 cores, 2 Playwright workers) handles the
full suite in 5.3-8.1 min.

`kyoto` deliberately omitted from the branch list: `origin/kyoto` last
moved 2026-02-02 and is 825 commits behind main; the sibling workflows
still reference it but that's cargo-cult.
Phase 4 — WASM bridge (`incremental_write_qmd`):
- Signature now `(original_qmd, baseline_ast_json, new_ast_json)`.
- Internal `qmd_to_pandoc` re-parse removed; baseline is deserialized
  via `pampa::readers::json::read`, preserving any host-side
  provenance the caller attached (e.g. `preimage_in` from a prior
  incremental edit).
- `AstResponse.warnings` populated via
  `diagnostics_to_json(&warnings, &baseline_context.source_context)`.

Phase 5 — TS wrappers + sync-client:
- `wasmRenderer.ts` `incrementalWriteQmd` returns
  `{ qmd, warnings? }`; baseline accepted as parsed AST or
  pre-serialized JSON string.
- `.d.ts` files updated (preview-runtime + hub-client); add
  `warnings?` to `AstResponse`.
- `quarto-sync-client` `astOptions.incrementalWriteQmd` widened;
  `client.ts` passes `cached.ast` as baseline and reads `result.qmd`.
- `pipelineKind.ts` (+ test) moved to `ts-packages/preview-runtime/`
  and re-exported; `ReactPreview.tsx` imports it from the package.
  SPA does not import it yet (Phase 7).

Phase 6 — Consumers:
- `ReactPreview.handleSetAst`: read-only guard removed; passes the
  displayed `ast` state as baseline; soft-drop warnings (Q-3-42 /
  Q-3-43) are stashed in a ref and merged into the next
  `onDiagnosticsChange` push so they surface in the existing
  diagnostics panel.
- Both demos (`kanban`, `hub-react-todo`): `wasm.ts`,
  `useSyncedAst.ts`, and the local `.d.ts` shim updated to the new
  signature/return shape.

Verification: `cargo xtask verify` green — all 12 steps including
WASM rebuild and hub-client tests; 9535 Rust tests pass.

Plan 7 (`claude-notes/plans/2026-05-04-q2-preview-plan-7-incremental-writer.md`)
phases 4-6 checkboxes flipped to done.
…trip

`PreviewApp.tsx`:
- `noopSetAst` replaced with `handleSetAst` that calls
  `incrementalWriteQmd(originalQmd, baselineJson, newAst)` using the
  active page's qmd + the current `astJson` as the baseline. Stable
  callback identity via `activeFileRef` / `astJsonRef` so the
  iframe's postMessage listener doesn't re-bind on every render.
- Content-match echo-prevention: hash the emitted qmd with FNV-1a,
  stash `(path, hash)` in `lastEmittedRef`, and the next
  `onFileContent` for that path whose content hashes equal is
  silently dropped (consumes the ref). Avoids the SPA re-rendering
  off its own write and racing follow-up edits. Hash-algorithm
  rationale is in the `fnv1aHex` docstring.
- Soft-drop warnings (Q-3-42 / Q-3-43) accumulate into
  `writeWarnings` state and surface via `<DiagnosticStrip>`.

`components/DiagnosticStrip.tsx` (new):
- Inline-styled fixed strip in the bottom-right of the preview pane;
  matches the existing component convention (no separate CSS file).
- `suppressAfterThree` helper caps each `(code, source-range)` group
  at 3 entries per Plan 7 §"Autosave-context spam mitigation" so
  every-keystroke renders don't flood the surface.
- Catalog title + problem text are rendered verbatim — Phase 3's
  catalog entries are already imperative ("edit the invocation token
  in source instead").

Verification: `cargo xtask verify --skip-rust-tests` green — all 12
steps including hub-client build/test and q2-preview-spa build.

Plan 7 Phase 7 checkboxes flipped to done.
Phase 8 subset (landed now):
- `hub-client/src/services/incrementalWrite.wasm.test.ts` — vitest
  test against the real WASM bridge, pinning the 3-arg signature
  (`original_qmd, baseline_ast_json, new_ast_json`), the
  `{ qmd, warnings? }` return shape, identity round-trip
  byte-equality, paragraph-edit text propagation with surrounding
  structure preserved, and structured-error reporting on malformed
  baseline JSON. 3/3 passing under `npm run test:wasm`.
- Plan 3's idempotence test re-run as part of `cargo xtask verify`
  (9535/9535 Rust tests).

Phase 9 (verification):
- WASM chain refreshed end-to-end:
    `npm run build:wasm` → `cargo xtask build-q2-preview-spa` →
    `cargo build --bin q2`
- `q2 preview /tmp/plan7-smoke` boot smoke: server came up, SPA
  rendered the fixture, confirmed in the user's browser (session
  2026-05-24).
- Full `cargo xtask verify` already green from Phase 4-6 / 7 commits.

Deferred to `bd-3izo3`:
- The Playwright e2e scenario matrix (sectionized round-trip, single
  & multi-inline shortcode preservation, Q-3-42 byte-equal-no-op,
  Q-3-43 footnotes regeneration, SPA edit-paragraph in project +
  single-file modes, SPA DiagnosticStrip on shortcode edit, mixed
  atomic + non-atomic, content-match echo-prevention fixture).
- The Rust-side soft-drop matrix is already covered in
  `crates/pampa/src/writers/incremental.rs`; the deferred work is
  end-to-end *delivery* coverage, not new correctness coverage.
Plan 9's Phase 5 (deferred Plan-7 invariant tests) gets a status
note recording that Plan 7 landed on `feature/provenance` 2026-05-24,
so when Plan 9 picks up, the dependency is visibly unblocked. Plan 7
closes its corresponding Phase-9 checkbox — the test reference is
already in Plan 9 Phase 5, both plans cross-link correctly.
Plan 7's session left four code-side test items deferred (three
Rust unit tests where the original plan author hedged about
fixture construction, one Playwright e2e scenario matrix scoped
out for context budget). On reassessment all four are mechanical
follow-throughs, not research.

Plan 7b consolidates them into one deliberate test pass with three
phases:
  1. Rust unit tests (writer-lossless baseline, soft-drop
     interaction, filter-construction UseAfter)
  2. Hub-client Playwright specs (5 scenarios)
  3. SPA Playwright specs (5 scenarios)
  4. Cleanup (close bd-3izo3, flip Plan 7's deferred checkboxes)

Intended to run before Plan 7a so the writer's round-trip contract
is fully pinned before runtime idempotence detection layers on top.

Plan 7 gets forward pointers from each deferred checkbox to its
new home in Plan 7b.
Adds claude-notes/designs/provenance-contract.md as the reference doc
for transform authors emitting SourceInfo. Captures the Plan-6
conventions that survived implementation: four-branch decision tree
for picking Original / Generated{from:[]} / Generated{from:[Invocation]}
/ leave-alone, the By:: constructor catalog with atomicity flags, the
enrichment-via-post-walk pattern (kind promotion, by.data migration,
anchor appending) with stamp_shortcode_anchors as the reference
implementation, the AttrSourceInfo positional-alignment threading
recipe with the relaxed debug_assert! footgun called out, atomic-kind
consumer impact, required-anchor invariants, the call-site-threading
outlier pattern (make_error_inline / shortcode_to_literal), and a
do-not list.

Mirrors the structure and tone of
claude-notes/designs/document-profile-contract.md. Links out to
Plans 4-8 and the Plan-6 audit report rather than re-explaining their
content; names follow-up beads (bd-129m3 ValueSource, bd-36fr9
Dispatch, bd-12vrr callout, bd-1inj0 codeblock chrome, bd-3aolj /
bd-1e6a5 parser alignment) without designing them.
Plan 7c closes four correctness/coverage items where the post-review
Plan-7 doc and the actual landed implementation diverged because the
implementation agent ran before the review-pass merge:

1. Q-3-41 "Edit dropped — render not ready yet" — catalog entry +
   TS-side emission from both ReactPreview and the SPA so the
   first-edit-before-render case no longer drops silently.

2. TS-side hasPreimageIn + isEditableInside — the predicate pair
   that closes Plan 2A's framework gate so edits that the writer
   would soft-drop are intercepted at the DOM. The atomicity-only
   gate at framework/dispatch.tsx:404-411 is updated to consult
   the unified editability predicate; preimage_in walks Invocation
   only, per the writer contract.

3. cfg(debug_assertions) #[should_panic] test for the shortcode-
   Generated-with-empty-from debug-assert at incremental.rs:448.

4. Per-kind soft-drop test symmetry: explicit Omit + inline-UseAfter
   tests for filter / title-block / tree-sitter-postprocess kinds;
   multi-inline dedupe filter case.

Six phases, ~530 LOC. No new design surface; the plan is
disjoint from Plan 7b's test-o-rama scope and reuses existing
diagnostic / context infrastructure throughout.
…ppers in RecurseIntoContainer

Editing inside a `q2 preview` document silently failed with
`Incremental write failed: undefined` whenever the post-render
pipeline wrapped the whole document in a top-level `Generated{by:
sectionize}` Div. The reconciler aligned 1 Div : 1 Div as
RecurseIntoContainer; `coarsen`'s Plan 7 soft-drop guard then fired
on the wrapper (`is_editable_inside_block` is false for a Generated
with no preimage) and emitted `Omit` for the entire document,
producing an empty qmd + one Q-3-43 warning.

Mirror the existing Transparent path that
`coarsen_keep_before_block` uses for unchanged wrappers: when the
orig container is a non-atomic Generated with source-bearing
children AND `block_container_plans` has a nested plan for this
index, recurse `coarsen_blocks` on the children and wrap in
`Transparent`. Children carry `orig_idx: None` (children-relative,
not top-level) via a new normalizer.

Refactor: split `coarsen` into a thin Pandoc-aware entry that
derives `target_file_id` and a slice-based `coarsen_blocks` that
the new `coarsen_children` reuses. Existing soft-drop tests
(no-recursable-children case) still hit the last-resort `Omit` +
Q-3-43 path.

Tests:
- `crates/pampa/tests/incremental_writer_tests.rs`:
  `sectionize_wrapper_with_inner_para_edit_produces_nonempty_output`
- `hub-client/e2e/q2-preview-render-components-write.spec.ts` drives
  the real user flow (q2 preview + comment.tsx +react picker) and
  is the deterministic browser repro the diagnosis was built on.

Bonus: `ts-packages/preview-runtime/src/wasmRenderer.ts` throw site
now distinguishes empty-qmd from real-error and surfaces the warning
count instead of literal "undefined".
…_file_id (Plan 7c Phase 8)

Closes Plan 7c Phase 8.

`coarsen` derived `target_file_id` from
`original_ast.blocks.first().and_then(|b| b.source_info().root_file_id())`,
falling back to `FileId(0)` when the first block was a synthesized
container (title-block, sectionize wrapper, footnotes / appendix
container) — all of which carry `SourceInfo::Generated` with no
`Invocation` anchor, so `root_file_id()` returns `None`. On
single-file fixtures the qmd happens to live at `FileId(0)` and the
fallback was coincidentally correct, masking the bug.

Replace the call with a `derive_target_file_id` helper that walks
`blocks` depth-first, descending through `block_block_children`
(Div / BlockQuote / Figure / NoteDefinitionFencedBlock), returning
the first `Some(root_file_id())` it sees. Descent matters for the
sole-top-level-sectionize-wrapper shape too — without it,
`preimage_in(FileId(0))` would return `None` for every real block
inside the wrapper, all editability checks would fail, and the
RecurseIntoContainer path would soft-drop the user's edit with a
Q-3-43 even when the qmd is genuinely a single file.

Tests:
- `target_file_id_skips_synthesized_first_block` — synthesized
  title-block at `blocks[0]`, real Para at `blocks[1]` carrying
  `FileId(7)`. Pre-fix: identity reconcile with an inline edit on
  the real Para fires Q-3-43 (Para is gated non-editable because
  `preimage_in(FileId(0))` on a `FileId(7)`-Original returns
  `None`). Post-fix: no warning.
- `target_file_id_defaults_to_zero_for_empty_document` — pins the
  `FileId(0)` fallback for the genuinely-empty AST.
…[0] is a synthesized wrapper

`emit_metadata_prefix` read `original_ast.blocks[0].source_info().start_offset()`
to locate the boundary between the YAML frontmatter region and the
first user block. For the post-q2-preview-pipeline AST whose
`blocks[0]` is a synthesized sectionize Div (Generated, no
Invocation anchor), that offset is 0, so the function concluded
"no metadata region" and silently deleted the entire frontmatter
from the output. The fix in bdcfdc5 (Transparent recursion into
the wrapper for `RecurseIntoContainer`) unmasked this second-order
bug: edits now round-trip, but the frontmatter vanishes.

Add a `first_target_anchored_start_in` helper that walks `blocks`
depth-first, descending through `block_block_children` for blocks
that have no preimage of their own, returning the first start
offset that DOES have preimage in the target file. Sole-block
sectionize wrappers (sectionize, footnotes, appendix) yield their
children's first real start instead of `0`. Use it from
`emit_metadata_prefix`, paired with the `derive_target_file_id`
helper from b9f64b5 so a non-FileId(0) qmd is handled too.

Regression test: `sectionize_wrapper_preserves_frontmatter_after_inner_edit`
— wraps the user-reported repro shape (frontmatter +
sectionize-wrapped Header + Para with an EditComment append) and
asserts both the frontmatter and the spliced reaction land in the
output.
…tern + cross-link plans

The three sectionize-wrapper bugs of 2026-05-25 (`bdcfdc53` /
`b9f64b56` / `2bf92664`) were three rediscoveries of the same
fact: code that asks `original_ast.blocks[0]` for source-position
information assumes a flat AST, but the post-q2-preview-pipeline
AST wraps everything in a top-level synthesized container. Each
fix grew its own ad-hoc descent helper.

This commit names the pattern (*transparent wrapper*) and lifts
the descent into one reusable walker, so the next caller doesn't
have to rediscover it.

Code changes:

- New `first_in_user_tree<T>(blocks, extract)` — walks blocks
  depth-first, descending through `block_block_children` when
  `extract` returns `None`. This is the descent primitive both
  earlier helpers were re-implementing.
- New `is_transparent_wrapper(block, target)` predicate —
  structurally checkable per-block (Generated, no Invocation
  anchor, block-container shape, has source-bearing descendants).
  No registration / opt-in: a Lua filter that wraps user content
  in a Div with the right shape is automatically transparent.
- `derive_target_file_id` and `first_target_anchored_start_in`
  reduce to one-liners on top of `first_in_user_tree`. Net code
  decrease.

Design doc `claude-notes/designs/transparent-wrappers.md` pins
the contract: the three structural conditions, the known
synthesizers, the "where to use which" table, the "where the
code lives + when to promote it" rule, an anti-pattern catalog,
and a history table for the three originating bugs.

Plan cross-references (annotation only, no scope change):

- Plan 9 (`title_source_info`) — invariant note: extractor runs
  pre-sugar today; if moved past sectionize or extended with a
  "first H1" fallback, must use `first_in_user_tree`.
- Plan 8 (IncludeExpansion) — note that the wrapper's `Original`
  source_info is what keeps it from being a transparent wrapper;
  debug-assert recommended if a future variant emits `Generated`.
- Plan 10 (Dispatch / Lua filters) — new sub-section "Lua filters
  that wrap user content" describes the implicit editing contract
  (emit the right shape, the visual editor sees through it; no
  registration needed).
- Plan 7a (filter idempotence) — Q-3-44 hint can detect
  walked-into-the-wrapper authoring errors via
  `is_transparent_wrapper(blocks[0])`.
- Plan 7b (test-o-rama) — gap noted: existing writer-lossless
  fixtures assume a flat AST; add a sectionize-wrapper-at-top
  variant.
- Project-replay engine — annotation that the flat walk is only
  safe because the splice runs pre-sugar.
- Plan 7c — reference link to the new design doc.
- `provenance-contract.md` §8 — sibling cross-link to the new
  doc; producer-side catalog of wrapper kinds is here, consumer-
  side descent rule is there.

Tests: all 1570 pampa tests still pass; the Playwright e2e (the
deterministic browser repro for the first bug) still passes
against the rebuilt WASM.
…when the rewrite produces byte-identical output

The Plan 7 soft-drop diagnostic surface (Q-3-42 / Q-3-43 warnings
from the incremental writer) relied on one delivery path:
`handleSetAst` stored warnings in `pendingWriteWarningsRef`, then
the next render's `doRenderWithStateManagement` drained the ref
into its merged diagnostics push.

That path silently dropped warnings in the common soft-drop case.
When the writer faithfully preserves the original bytes (the
correct behaviour for an edit it had to reject — e.g. typing
inside a `{{< lipsum 3 >}}` resolution), `incrementalWriteQmd`
returns warnings AND byte-identical output. `handleContentRewrite`
in `useAutomergeSync.ts` then computes `diffToMonacoEdits(old, new)`,
gets an empty edit list, and skips `executeEdits`. Monaco's
`onChange` never fires; automerge doesn't update; no re-render
happens; `pendingWriteWarningsRef.current` stays full forever.
The user sees no signal that their edit was declined.

Add an *immediate* push path alongside the ride-along:

1. `handleSetAst`, on warnings, now calls `onDiagnosticsChange`
   directly — merged with the most recent render-side diagnostics
   (tracked in a new `lastRenderDiagnosticsRef`, updated wherever
   `doRenderWithStateManagement` calls `onDiagnosticsChange`).
2. The existing `pendingWriteWarningsRef` ride-along stays as a
   safety net for the rare case where a re-render *does* fire
   after a write that produced warnings (typically when the
   writer chose Rewrite over soft-drop). It's a no-op for the
   byte-identical path.

After this fix, clicking +react on a paragraph inside
`{{< lipsum 3 >}}` produces a visible Q-3-43 in the diagnostic
panel: "Generated content edit dropped — This content has no
editable source position in this file; edit its upstream
definition (an include, a metadata key, or other source) instead."

Verified by manual repro; the existing e2e (the deterministic
browser test for the earlier empty-qmd bug) is unaffected.
…ed + close atomic-Generated UseAfter soft-drop gap

The architectural change: every CoarsenedEntry variant now carries
everything it needs to produce its bytes. Rewrite previously held
`new_idx: usize` — an index into new_ast.blocks that was only
correct at the top level. When coarsen_blocks ran inside the
Transparent recursion added in bdcfdc5 for the changed-wrapper
case, the index pointed at a child-relative position while
emit_entries looked it up against new_ast.blocks top-level. The
result was an index-out-of-bounds panic at incremental.rs:890
on +react edits inside shortcode-resolved content when the
framework's atomic-aware NOOP gate was bypassed for UX testing
("the len is 1 but the index is N").

Rewrite now carries `block_text: String`, pre-computed at coarsen
time via the same write_block_to_string call emit_entries used to
make. The call is referentially transparent (verified: no global
state in writers/qmd.rs; fresh QmdWriterContext per invocation;
no I/O, no clock), so this is a shape change, not a semantics
change — every existing test stays byte-identical. The shape
matches InlineSplice, which has carried block_text since
ab10f37. All four producer sites updated; coarsen_keep_before_block
became fallible (`Result<CoarsenedEntry, Vec<DiagnosticMessage>>`)
since it now calls the writer; both call sites use `?`.

A second bug, masked by the panic, surfaced during Phase 3
verification when the user tested the lipsum +react flow with the
new fix in place. The BlockAlignment::UseAfter arm filtered
atomic-CustomNode and no-preimage-Generated but had no branch for
atomic-Generated-WITH-preimage. When the reconciler split the
inline edit on an atomic-shortcode paragraph into KeepBefore
(Header) + UseAfter (new lipsum) — implicit deletion of the
original — the new block's source_info still carried the token's
Invocation anchor, but UseAfter fell through to let-user-win
Rewrite. With the architectural fix that no longer panics, it
instead silently wrote the resolved bytes (the resolved Lorem
ipsum + the user's reactji) back into the source qmd, poisoning
the user's source. The new UseAfter branch detects atomic-Generated
with preimage and emits Verbatim of the token range + Q-3-43,
mirroring the soft-drop cascade already in RecurseIntoContainer.
The general pattern: when an entry's *new* block looks like an
attempt to edit content the user can't actually edit, refuse the
edit at the writer regardless of what the reconciler's alignment
said.

Tests pin both shapes:
- sectionize_wrapper_with_shortcode_child_edit_does_not_panic
  — the architectural Rewrite shape, asserts no panic.
- sectionize_wrapper_shortcode_child_edit_soft_drops
  — the UseAfter soft-drop, asserts on output bytes
  (token preserved, reactji NOT emitted) and Q-3-43 fired.

Verification: cargo nextest -p pampa (3902/3902); cargo xtask
verify --skip-hub-build --skip-hub-tests (9656/9656 Rust);
hub-client npm run build:wasm + VITE_E2E=1 build; Playwright
q2-preview-render-components-write (1 passed); ts-packages/
preview-renderer integration tests (9 files, 165 tests). User
confirmed lipsum-paragraph +react flow in browser: no panic,
Q-3-43 surfaced, source qmd preserved.

Note: the diagnostic surfacing itself depends on a companion fix
to quarto-error-reporting (separate commit), which makes
render_ariadne_source_context gracefully degrade when file reads
aren't supported (WASM).

See claude-notes/plans/2026-05-25-coarsened-entry-self-contained.md
for the plan; claude-notes/designs/incremental-writer-internals.md
for the contract this work pins ("every variant carries enough
information to produce its emit bytes without further context").
…t when file read fails (WASM)

render_ariadne_source_context panicked with "Failed to read file
'…': operation not supported on this platform" whenever it tried
to fetch source bytes for a file-backed location in WASM. The
existing code unwrapped std::fs::read_to_string with a panic
message, which works on native (the read genuinely succeeds) but
crashes in WASM (no real filesystem; the call returns an Err
unconditionally).

Until now the panic was unreachable in practice — soft-drop
diagnostics carrying file-backed Generated locations either
weren't reaching the renderer (an upstream panic short-circuited
the path) or weren't being surfaced in WASM contexts. The Q-3-43
soft-drop emitted from the UseAfter arm of the incremental
writer (companion commit on this branch) now reliably surfaces,
so the disk-read path actually runs inside q2-preview's iframe.

Change: when std::fs::read_to_string fails, return None instead
of panicking. The diagnostic's code, message, and hints still
surface — only the Ariadne visual snippet is dropped.

Verification: cargo nextest -p quarto-error-reporting (70/70).
User confirmed Q-3-43 surfaces in the browser without panic
after the WASM rebuild.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant