pampa: migrate FileId scheme from sequential to hash-based (bd-ky14a)#235
Open
cscheid wants to merge 1 commit into
Open
pampa: migrate FileId scheme from sequential to hash-based (bd-ky14a)#235cscheid wants to merge 1 commit into
cscheid wants to merge 1 commit into
Conversation
Pampa's ASTContext now derives FileIds via
`quarto_yaml::file_id_for_filename(name)` (hashing the filename
string) instead of using sequential `FileId(0)` for the document's
primary file. This makes pampa's `SourceInfo`s globally
interchangeable with `SourceInfo`s produced by quarto_yaml (and any
other parser using the same recipe), eliminating the
out-of-band-binding requirement that bd-1pwy8 had to work around
with `(FileId, &Path)` candidate pairs.
## What changed
### Core: ASTContext gains a primary_file_id field
`crates/pampa/src/pandoc/ast_context.rs`
- New private field `primary_file_id: FileId`, cached at
construction so `current_file_id()` doesn't re-hash on every
call.
- `new()` and `anonymous()` now delegate to `with_filename(name)`
with the `<unknown>` / `<anonymous>` placeholder names.
- `with_filename` computes the FileId via
`quarto_yaml::file_id_for_filename` and registers the
SourceContext entry via `add_file_with_id`.
- New `from_parts(filenames, source_context, primary_file_id)`
constructor for deserialization paths (the JSON reader uses
this).
### Production sites updated to use hash-based FileIds
- `crates/pampa/src/readers/qmd.rs` — registers the input under
`current_file_id()` (= hash) instead of the sequential 0.
- `crates/pampa/src/readers/json.rs` — when deserializing
ASTContext, computes each file's hash FileId and uses the new
`SourceContext::add_file_with_id_and_info` helper to preserve
the line-break index.
- `crates/quarto-core/src/stage/stages/parse_document.rs` — the
SourceContext built in the parse stage now uses
`add_file_with_id(file_id_for_filename(source_name), …)` so
it matches what pampa puts on the AST's SourceInfos.
- `crates/quarto-core/src/pipeline.rs` (q2-preview path) — same
treatment.
- `crates/perf-harness/src/bin/parse_qmd_to_ast.rs` — replaces
struct-literal ASTContext construction with `from_parts`.
- `crates/wasm-quarto-hub-client/src/lib.rs` — same.
- `crates/quarto-lsp-core/src/document.rs`
(`Document::create_source_context*`) — registers the document
under the hash FileId so analyze_document's SourceInfos resolve.
- `crates/quarto-core/src/transforms/attribution_render.rs` — the
v1 "skip non-primary file" gate previously compared `file_id != 0`;
now it compares against the AST-derived primary file_id.
### Simplifications enabled by the new scheme
- `crates/quarto-core/src/stage/stages/include_expansion.rs` —
dropped the `quarto_ast_reconcile::remap_file_ids(FileId(0) →
new_sequential_id)` workaround and the parallel-SourceContext
lockstep with the `debug_assert_eq!` desync check. Included
sub-documents now land at `FileId(hash(filename))` natively;
registration in the parent's contexts is idempotent on FileId.
- `crates/quarto-core/src/stage/stages/engine_execution.rs` —
same: dropped the `remap_file_ids(id.0 + 1)` shift; the
intermediate `.rmarkdown` already has its hash FileId.
### Source-map helper
- `crates/quarto-source-map/src/context.rs` — new
`add_file_with_id_and_info` method, combining the existing
`add_file_with_id` (specific FileId, can panic if duplicate)
with `add_file_with_info` (pre-computed FileInformation).
Needed by the JSON deserializer's roundtrip path.
## Contract tests (the TDD anchor)
Five new tests across the workspace. Each one fails on `main` with
the expected `FileId(0)` vs `FileId(hash)` mismatch and turns
green after the migration:
1. `pampa::pandoc::ast_context::tests::bd_ky14a_with_filename_uses_quarto_yaml_file_id`
2. `pampa::pandoc::ast_context::tests::bd_ky14a_source_context_indexed_by_hash_file_id`
3. `quarto_core::bd_ky14a_file_id_contract::bd_ky14a_pampa_qmd_read_uses_quarto_yaml_file_id`
4. `quarto_core::bd_ky14a_file_id_contract::bd_ky14a_sub_document_file_id_is_hash_based`
5. `quarto_core::bd_ky14a_file_id_contract::bd_ky14a_fresh_source_context_renders_pampa_source_info`
## Pre-existing tests touched
- `pampa::tests::json_location_test::test_json_location_enabled`
— assertion `location["f"] == 0` updated to `file_id_for_filename`.
- `pampa::tests::qmd_writer_source_info::round_trip_code_block_offset_accuracy`
— test fixture's `SourceContext` now uses `add_file_with_id`
with the hash FileId, matching the SourceInfos that come out
of the parser.
- `quarto-core::stage::stages::engine_execution::tests::test_engine_execution_remaps_new_blocks_to_intermediate`
— expected FileIds updated to the hashes of the two filenames
involved.
- `quarto-core::stage::stages::include_expansion::tests::included_blocks_have_correct_file_id`
— same.
- `pampa::pandoc::ast_context::tests::test_primary_file_id` /
`test_current_file_id` — assert against the hash, not literal
`FileId(0)`.
## Snapshot test changes
**62 snapshot files updated** in `crates/pampa/snapshots/json/`.
The diffs are exclusively `"d": 0` → `"d": <hash>` substitutions
in the `sourceInfoPool` entries for `Original` SourceInfos (entries
with `"t":0`). Pool *index* values (small integers in non-`d`
positions) are unchanged. No semantic changes — just the FileIds
that pampa now emits.
Affected snapshot files (sample, see git diff for the full set):
- `001.snap`, `002.snap`, `003.snap`
- `anchor-shorthand-{01..06}-*.snap` (6 files)
- `header-autoid-*.snap` (3 files)
- `horizontal-rules*.snap` (2 files)
- `html-comment-{01..48}-*.snap` (48 files)
- `math-with-attr.snap`, `table-alignment.snap`,
`table-caption-attr.snap`, `yaml-tags.snap`
## Verification
- `cargo nextest run --workspace` → 9418 passed, 0 failed.
(Net new: 5 contract tests; otherwise the count matches the
bd-1pwy8 baseline.)
- `cargo xtask verify` (full pass — Rust + WASM + hub-client +
tests) → green.
## Workflow
Per the plan, this PR should be **reviewed by the parallel
source-location workstream** before merging. Push as a feature
branch; do **not** merge to main locally.
Plan: claude-notes/plans/2026-05-22-pampa-hash-fileids.md
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ASTContextnow derives FileIds viaquarto_yaml::file_id_for_filename(name)(hashing the filename string) instead of using sequentialFileId(0).(FileId, &Path)candidate pairs.quarto_ast_reconcile::remap_file_ids(...)workarounds ininclude_expansion.rsandengine_execution.rs, and the parallel-SourceContextlockstep in include-expansion.Why review carefully
There's a parallel source-location workstream in flight; this PR should be reviewed against it before merging. Specifically check:
ASTContext/SourceContextshape (struct fields)?FileId(0)literals that would need the same migration?FileId(0)as a stable serialization key (e.g. JSON wire format)?The JSON wire format does change in a user-visible way: FileIds in
sourceInfoPool.dgo from small integers (0, 1, ...) to 64-bit hashes. The 62pampa/snapshots/json/*.snapupdates in this PR all reflect this. The hub-client SPA + q2-preview consumers were checked and don't depend on FileId values being small.Contract tests (TDD anchor)
Five tests that fail on `main` with the expected `FileId(0)` vs `FileId(hash)` mismatch and turn green after the migration:
Snapshot changes
62 `pampa/snapshots/json/*.snap` files updated. Diffs are exclusively `"d": 0` → `"d": ` substitutions in `sourceInfoPool` entries with `"t":0` (the Original-SourceInfo tag). Pool index values (other small integers in non-`d` positions) are unchanged. No structural/semantic changes.
Test plan
Plan document
`claude-notes/plans/2026-05-22-pampa-hash-fileids.md` — full motivation, design rationale, and work-items checklist (now all checked).
🤖 Generated with Claude Code