pampa: migrate FileId scheme from sequential to hash-based (bd-ky14a) by cscheid · Pull Request #235 · quarto-dev/q2

cscheid · 2026-05-22T17:17:08Z

Summary

Pampa's ASTContext now derives FileIds via quarto_yaml::file_id_for_filename(name) (hashing the filename string) instead of using sequential FileId(0).
Eliminates the cross-parser FileId asymmetry the bd-1pwy8 work had to wrap around with (FileId, &Path) candidate pairs.
Drops the quarto_ast_reconcile::remap_file_ids(...) workarounds in include_expansion.rs and engine_execution.rs, and the parallel-SourceContext lockstep in include-expansion.

Why review carefully

There's a parallel source-location workstream in flight; this PR should be reviewed against it before merging. Specifically check:

Does the parallel work also touch ASTContext / SourceContext shape (struct fields)?
Does it introduce new FileId(0) literals that would need the same migration?
Does anything in the parallel work depend on FileId(0) as a stable serialization key (e.g. JSON wire format)?

The JSON wire format does change in a user-visible way: FileIds in sourceInfoPool.d go from small integers (0, 1, ...) to 64-bit hashes. The 62 pampa/snapshots/json/*.snap updates in this PR all reflect this. The hub-client SPA + q2-preview consumers were checked and don't depend on FileId values being small.

Contract tests (TDD anchor)

Five tests that fail on `main` with the expected `FileId(0)` vs `FileId(hash)` mismatch and turn green after the migration:

`pampa::pandoc::ast_context::tests::bd_ky14a_with_filename_uses_quarto_yaml_file_id`
`pampa::pandoc::ast_context::tests::bd_ky14a_source_context_indexed_by_hash_file_id`
`quarto_core::bd_ky14a_file_id_contract::bd_ky14a_pampa_qmd_read_uses_quarto_yaml_file_id`
`quarto_core::bd_ky14a_file_id_contract::bd_ky14a_sub_document_file_id_is_hash_based`
`quarto_core::bd_ky14a_file_id_contract::bd_ky14a_fresh_source_context_renders_pampa_source_info`

Snapshot changes

62 `pampa/snapshots/json/*.snap` files updated. Diffs are exclusively `"d": 0` → `"d": ` substitutions in `sourceInfoPool` entries with `"t":0` (the Original-SourceInfo tag). Pool index values (other small integers in non-`d` positions) are unchanged. No structural/semantic changes.

Test plan

`cargo nextest run --workspace` — 9418 passed, 0 failed (5 net new contract tests)
`cargo xtask verify` — full pass (Rust + WASM + hub-client + tests)
Reviewer-side check that the JSON FileId format change is acceptable for parallel source-location work
Reviewer-side coordination on `ASTContext` / `SourceContext` struct shape if the parallel work also touches them

Plan document

`claude-notes/plans/2026-05-22-pampa-hash-fileids.md` — full motivation, design rationale, and work-items checklist (now all checked).

🤖 Generated with Claude Code

Pampa's ASTContext now derives FileIds via `quarto_yaml::file_id_for_filename(name)` (hashing the filename string) instead of using sequential `FileId(0)` for the document's primary file. This makes pampa's `SourceInfo`s globally interchangeable with `SourceInfo`s produced by quarto_yaml (and any other parser using the same recipe), eliminating the out-of-band-binding requirement that bd-1pwy8 had to work around with `(FileId, &Path)` candidate pairs. ## What changed ### Core: ASTContext gains a primary_file_id field `crates/pampa/src/pandoc/ast_context.rs` - New private field `primary_file_id: FileId`, cached at construction so `current_file_id()` doesn't re-hash on every call. - `new()` and `anonymous()` now delegate to `with_filename(name)` with the `<unknown>` / `<anonymous>` placeholder names. - `with_filename` computes the FileId via `quarto_yaml::file_id_for_filename` and registers the SourceContext entry via `add_file_with_id`. - New `from_parts(filenames, source_context, primary_file_id)` constructor for deserialization paths (the JSON reader uses this). ### Production sites updated to use hash-based FileIds - `crates/pampa/src/readers/qmd.rs` — registers the input under `current_file_id()` (= hash) instead of the sequential 0. - `crates/pampa/src/readers/json.rs` — when deserializing ASTContext, computes each file's hash FileId and uses the new `SourceContext::add_file_with_id_and_info` helper to preserve the line-break index. - `crates/quarto-core/src/stage/stages/parse_document.rs` — the SourceContext built in the parse stage now uses `add_file_with_id(file_id_for_filename(source_name), …)` so it matches what pampa puts on the AST's SourceInfos. - `crates/quarto-core/src/pipeline.rs` (q2-preview path) — same treatment. - `crates/perf-harness/src/bin/parse_qmd_to_ast.rs` — replaces struct-literal ASTContext construction with `from_parts`. - `crates/wasm-quarto-hub-client/src/lib.rs` — same. - `crates/quarto-lsp-core/src/document.rs` (`Document::create_source_context*`) — registers the document under the hash FileId so analyze_document's SourceInfos resolve. - `crates/quarto-core/src/transforms/attribution_render.rs` — the v1 "skip non-primary file" gate previously compared `file_id != 0`; now it compares against the AST-derived primary file_id. ### Simplifications enabled by the new scheme - `crates/quarto-core/src/stage/stages/include_expansion.rs` — dropped the `quarto_ast_reconcile::remap_file_ids(FileId(0) → new_sequential_id)` workaround and the parallel-SourceContext lockstep with the `debug_assert_eq!` desync check. Included sub-documents now land at `FileId(hash(filename))` natively; registration in the parent's contexts is idempotent on FileId. - `crates/quarto-core/src/stage/stages/engine_execution.rs` — same: dropped the `remap_file_ids(id.0 + 1)` shift; the intermediate `.rmarkdown` already has its hash FileId. ### Source-map helper - `crates/quarto-source-map/src/context.rs` — new `add_file_with_id_and_info` method, combining the existing `add_file_with_id` (specific FileId, can panic if duplicate) with `add_file_with_info` (pre-computed FileInformation). Needed by the JSON deserializer's roundtrip path. ## Contract tests (the TDD anchor) Five new tests across the workspace. Each one fails on `main` with the expected `FileId(0)` vs `FileId(hash)` mismatch and turns green after the migration: 1. `pampa::pandoc::ast_context::tests::bd_ky14a_with_filename_uses_quarto_yaml_file_id` 2. `pampa::pandoc::ast_context::tests::bd_ky14a_source_context_indexed_by_hash_file_id` 3. `quarto_core::bd_ky14a_file_id_contract::bd_ky14a_pampa_qmd_read_uses_quarto_yaml_file_id` 4. `quarto_core::bd_ky14a_file_id_contract::bd_ky14a_sub_document_file_id_is_hash_based` 5. `quarto_core::bd_ky14a_file_id_contract::bd_ky14a_fresh_source_context_renders_pampa_source_info` ## Pre-existing tests touched - `pampa::tests::json_location_test::test_json_location_enabled` — assertion `location["f"] == 0` updated to `file_id_for_filename`. - `pampa::tests::qmd_writer_source_info::round_trip_code_block_offset_accuracy` — test fixture's `SourceContext` now uses `add_file_with_id` with the hash FileId, matching the SourceInfos that come out of the parser. - `quarto-core::stage::stages::engine_execution::tests::test_engine_execution_remaps_new_blocks_to_intermediate` — expected FileIds updated to the hashes of the two filenames involved. - `quarto-core::stage::stages::include_expansion::tests::included_blocks_have_correct_file_id` — same. - `pampa::pandoc::ast_context::tests::test_primary_file_id` / `test_current_file_id` — assert against the hash, not literal `FileId(0)`. ## Snapshot test changes **62 snapshot files updated** in `crates/pampa/snapshots/json/`. The diffs are exclusively `"d": 0` → `"d": <hash>` substitutions in the `sourceInfoPool` entries for `Original` SourceInfos (entries with `"t":0`). Pool *index* values (small integers in non-`d` positions) are unchanged. No semantic changes — just the FileIds that pampa now emits. Affected snapshot files (sample, see git diff for the full set): - `001.snap`, `002.snap`, `003.snap` - `anchor-shorthand-{01..06}-*.snap` (6 files) - `header-autoid-*.snap` (3 files) - `horizontal-rules*.snap` (2 files) - `html-comment-{01..48}-*.snap` (48 files) - `math-with-attr.snap`, `table-alignment.snap`, `table-caption-attr.snap`, `yaml-tags.snap` ## Verification - `cargo nextest run --workspace` → 9418 passed, 0 failed. (Net new: 5 contract tests; otherwise the count matches the bd-1pwy8 baseline.) - `cargo xtask verify` (full pass — Rust + WASM + hub-client + tests) → green. ## Workflow Per the plan, this PR should be **reviewed by the parallel source-location workstream** before merging. Push as a feature branch; do **not** merge to main locally. Plan: claude-notes/plans/2026-05-22-pampa-hash-fileids.md Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pampa: migrate FileId scheme from sequential to hash-based (bd-ky14a)#235

pampa: migrate FileId scheme from sequential to hash-based (bd-ky14a)#235
cscheid wants to merge 1 commit into
mainfrom
feature/bd-ky14a-pampa-hash-fileids

cscheid commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cscheid commented May 22, 2026

Summary

Why review carefully

Contract tests (TDD anchor)

Snapshot changes

Test plan

Plan document

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant