Skip to content

[phase-31 2/4] Compaction metadata types#6258

Open
g-talbot wants to merge 1 commit intogtt/phase-31-sort-schemafrom
gtt/phase-31-compaction-metadata
Open

[phase-31 2/4] Compaction metadata types#6258
g-talbot wants to merge 1 commit intogtt/phase-31-sort-schemafrom
gtt/phase-31-compaction-metadata

Conversation

@g-talbot
Copy link
Copy Markdown

Summary

Extend MetricsSplitMetadata with 6 compaction fields and update PostgreSQL model types (Phase 31 Metadata Foundation, PR 2 of 4).

Stacks on gtt/phase-31-sort-schema (PR #6242).

What's included

split/metadata.rs:

  • 6 new fields on MetricsSplitMetadata: window_start, window_duration_secs, sort_fields, num_merge_ops, row_keys_proto, zonemap_regexes
  • parquet_files: Vec<String> field (replaces single-file assumption)
  • Builder methods for all new fields
  • TW-1 and TW-2 debug_assert! invariants from ADR-003
  • Backward-compat serde defaults for pre-Phase-31 JSON

split/postgres.rs:

  • Compaction columns added to InsertableMetricsSplit and PgMetricsSplit
  • from_metadata() populates new SQL columns
  • SS-5 debug assertions for JSON/SQL column consistency

schema/fields.rs:

  • ParquetField::from_name() for name-based field lookup (used by sort field resolver)

Verification

  • cargo build -p quickwit-parquet-engine
  • cargo test -p quickwit-parquet-engine -- split::metadata split::postgres ✅ (11 tests)
  • cargo clippy -p quickwit-parquet-engine --all-features --tests

Test plan

  • Pre-Phase-31 JSON backward compatibility test
  • Round-trip test with all compaction fields
  • skip_serializing_if test for optional fields
  • Postgres model round-trip test with compaction columns
  • Clippy clean

(Replaces #6243 which was accidentally auto-merged during a rebase)

🤖 Generated with Claude Code

@g-talbot g-talbot force-pushed the gtt/phase-31-compaction-metadata branch from 598de1a to c95095b Compare March 31, 2026 21:50
@g-talbot g-talbot force-pushed the gtt/phase-31-sort-schema branch 2 times, most recently from 3696bb4 to a90ca2f Compare April 1, 2026 10:48
@g-talbot g-talbot force-pushed the gtt/phase-31-compaction-metadata branch from c95095b to 5b1c080 Compare April 1, 2026 10:48
@g-talbot g-talbot force-pushed the gtt/phase-31-sort-schema branch from a90ca2f to d60973e Compare April 1, 2026 11:02
@g-talbot g-talbot force-pushed the gtt/phase-31-compaction-metadata branch from 5b1c080 to 6ebf40b Compare April 1, 2026 11:02
@g-talbot g-talbot force-pushed the gtt/phase-31-sort-schema branch from d60973e to cf9484d Compare April 1, 2026 11:30
@g-talbot g-talbot force-pushed the gtt/phase-31-compaction-metadata branch 2 times, most recently from 295f59c to f3c03dc Compare April 1, 2026 12:25
@g-talbot g-talbot force-pushed the gtt/phase-31-sort-schema branch from cf9484d to bb42b42 Compare April 1, 2026 12:25
@g-talbot g-talbot force-pushed the gtt/phase-31-compaction-metadata branch 2 times, most recently from acb5d28 to a8dc7a2 Compare April 1, 2026 16:25
@g-talbot g-talbot force-pushed the gtt/phase-31-sort-schema branch from bb42b42 to 5b2304c Compare April 1, 2026 16:59
@g-talbot g-talbot force-pushed the gtt/phase-31-compaction-metadata branch 2 times, most recently from 4ce16a3 to b89e965 Compare April 1, 2026 19:24
@g-talbot g-talbot force-pushed the gtt/phase-31-sort-schema branch from 39f875f to 4d42fd9 Compare April 1, 2026 20:17
@g-talbot g-talbot force-pushed the gtt/phase-31-compaction-metadata branch from b89e965 to b9566a6 Compare April 1, 2026 20:18
… model, field lookup

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@g-talbot g-talbot force-pushed the gtt/phase-31-compaction-metadata branch from b9566a6 to b6eb595 Compare April 1, 2026 20:50
@mattmkim
Copy link
Copy Markdown
Contributor

mattmkim commented Apr 6, 2026

@codex review

Copy link
Copy Markdown
Contributor

@mattmkim mattmkim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's see if codex finds anything

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b6eb5958d7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

low_cardinality_tags: HashMap<String, HashSet<String>>,
high_cardinality_tag_keys: HashSet<String>,
created_at: SystemTime,
parquet_file: String,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Add serde default for parquet_file in legacy metadata

MetricsSplitMetadataSerde now requires parquet_file, but metadata JSON produced before this commit did not contain that field, so deserialization will fail for existing rows during upgrades. PgMetricsSplit::to_metadata() parses split_metadata_json, and metastore.rs currently drops rows on parse failure via to_metadata().ok()?, which makes legacy splits disappear from list/query results instead of remaining readable.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i mean, true if we were in production already, but i don't think this matters.

low_cardinality_tags: self.low_cardinality_tags,
high_cardinality_tag_keys: self.high_cardinality_tag_keys,
created_at: SystemTime::now(),
parquet_file: self.parquet_file,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Derive parquet_file when builder callers omit it

The builder stores parquet_file as a default-initialized String and build() writes it through unchanged, so callers that do not explicitly call .parquet_file(...) now emit metadata with an empty parquet path. This silently records invalid split metadata for current construction paths and loses the file location that was previously derivable from split_id.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants