[metrics] Support DDSketch in the parquet pipeline by mattmkim · Pull Request #6257 · quickwit-oss/quickwit

mattmkim · 2026-03-31T15:14:07Z

Description

This PR can be reviewed commit by commit.

This PR updates the parquet pipeline to process DDSketches. See https://datadoghq.atlassian.net/wiki/spaces/QKHS/pages/6291357728/DDSketch+in+Parquet for more information about the DDSketch spec.

How was this PR tested?

Describe how you tested this PR.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 727f085864

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

quickwit/quickwit-indexing/src/actors/publisher.rs

quickwit/quickwit-metastore/src/metastore/postgres/metastore.rs

fulmicoton-dd · 2026-04-01T08:44:11Z

i cannot review this PR due to lack of context. I know what DDSketch are, but I do not know what they are used for in the context the metrics ingestion pipeline, why they are stored in different files, etc.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6e7d6a90ae

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

quickwit/quickwit-metastore/src/metastore/postgres/metastore.rs

ppknap · 2026-04-10T14:55:32Z

quickwit/quickwit-parquet-engine/src/ingest/arrow_sketches.rs

+
+            let keys_inner = keys_builder.values();
+            for &k in &dp.keys {
+                keys_inner.append_value(k);


Where do we store the length of keys/counts? I.e. how do we recover the number of elements per data_point from _inner arrays?

they are stored implicitly by Arrow - it stores an array of offsets, so you can get the length of row i with offsets[i+1] - offsets[i]

…ketches

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 37f4298f66

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

quickwit/quickwit-parquet-engine/src/storage/split_writer.rs

quickwit/quickwit-parquet-engine/src/split/metadata.rs

quickwit/quickwit-metastore/migrations/postgresql/28_create-sketch-splits.up.sql

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9a38c47b7c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-14T21:54:48Z

quickwit/quickwit-parquet-engine/src/storage/writer.rs

-        split_metadata: Option<&MetricsSplitMetadata>,
+        split_metadata: Option<&ParquetSplitMetadata>,
    ) -> Result<(RecordBatch, WriterProperties), ParquetWriteError> {
        validate_required_fields(&batch.schema())


Validate sketch batches with sketch schema before writing

prepare_write always calls validate_required_fields, which enforces the metrics schema (metric_type/value) even when the split metadata is Sketches. In the new sketch pipeline, sketch batches reach this writer with fields like count/sum/min/max/keys/counts, so writes fail with SchemaValidation and sketch splits cannot be produced or uploaded.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-14T21:54:48Z

quickwit/quickwit-metastore/src/metastore/postgres/metastore.rs

        const DELETE_SPLITS_QUERY: &str = r#"
            WITH input_splits AS (
-                SELECT input_splits.split_id, metrics_splits.split_state
+                SELECT input_splits.split_id, {table_name}.split_state


Interpolate parquet table name in delete query

delete_parquet_splits_impl constructs DELETE_SPLITS_QUERY as a raw string containing {table_name} placeholders but executes it directly, so Postgres sees literal {table_name} tokens instead of metrics_splits/sketch_splits. This makes parquet split deletion fail at runtime for both delete RPCs on Postgres.

Useful? React with 👍 / 👎.

mattmkim force-pushed the matthew.kim/parquet-sketches branch from db0e0db to 86c034b Compare March 31, 2026 21:14

mattmkim changed the title ~~[draft] parquet ddsketch engine~~ [metrics] Support DDSketch in the parquet pipeline Mar 31, 2026

mattmkim marked this pull request as ready for review March 31, 2026 21:30

mattmkim force-pushed the matthew.kim/parquet-sketches branch from c3fc790 to 2261237 Compare March 31, 2026 21:38

chatgpt-codex-connector bot reviewed Mar 31, 2026

View reviewed changes

quickwit/quickwit-indexing/src/actors/publisher.rs Show resolved Hide resolved

quickwit/quickwit-metastore/src/metastore/postgres/metastore.rs Outdated Show resolved Hide resolved

Base automatically changed from matthew.kim/metrics-wide-schema to main April 6, 2026 19:48

mattmkim force-pushed the matthew.kim/parquet-sketches branch from 2261237 to 6e7d6a9 Compare April 7, 2026 18:50

chatgpt-codex-connector bot reviewed Apr 7, 2026

View reviewed changes

quickwit/quickwit-metastore/src/metastore/postgres/metastore.rs Outdated Show resolved Hide resolved

mattmkim added 4 commits April 8, 2026 15:13

generalize metric splits to parquet splits

7bd941e

add sketch schema + processor

a40c275

sketch split support in metastore

46ae452

wire sketch pipeline into indexing actors and control plane

7490fe3

mattmkim force-pushed the matthew.kim/parquet-sketches branch from 6e7d6a9 to 7490fe3 Compare April 8, 2026 19:15

g-talbot mentioned this pull request Apr 9, 2026

feat: compute deterministic timeseries_id column at ingest #6286

Open

3 tasks

ppknap reviewed Apr 10, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into matthew.kim/parquet-s…

37f4298

…ketches

chatgpt-codex-connector bot reviewed Apr 14, 2026

View reviewed changes

quickwit/quickwit-parquet-engine/src/storage/split_writer.rs Outdated Show resolved Hide resolved

quickwit/quickwit-parquet-engine/src/split/metadata.rs Show resolved Hide resolved

quickwit/quickwit-metastore/migrations/postgresql/28_create-sketch-splits.up.sql Show resolved Hide resolved

clean up code, add compaction columns

9a38c47

chatgpt-codex-connector bot reviewed Apr 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[metrics] Support DDSketch in the parquet pipeline#6257

[metrics] Support DDSketch in the parquet pipeline#6257
mattmkim wants to merge 6 commits intomainfrom
matthew.kim/parquet-sketches

mattmkim commented Mar 31, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

fulmicoton-dd commented Apr 1, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

ppknap Apr 10, 2026

Uh oh!

mattmkim Apr 14, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 14, 2026

Uh oh!

chatgpt-codex-connector bot Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mattmkim commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How was this PR tested?

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

fulmicoton-dd commented Apr 1, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

ppknap Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

mattmkim Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mattmkim commented Mar 31, 2026 •

edited

Loading