Skip to content

Remove Empty Chunks in Cloudfetch concatenation#814

Open
jprakash-db wants to merge 2 commits into
mainfrom
jprakash-db/rm-mt-chunk
Open

Remove Empty Chunks in Cloudfetch concatenation#814
jprakash-db wants to merge 2 commits into
mainfrom
jprakash-db/rm-mt-chunk

Conversation

@jprakash-db
Copy link
Copy Markdown
Contributor

@jprakash-db jprakash-db commented May 27, 2026

What type of PR is this?

Description

ThriftResultSet.fetchmany_arrow and fetchall_arrow previously appended every chunk
returned by the underlying queue — including the 0-row placeholder that
CloudFetchQueue._create_empty_table() emits when self.table is None — into
partial_result_chunks, and then handed the list to concat_table_chunks.

When the placeholder's schema differs from the real downloaded chunks (which it can,
because it is built from schema_bytes that may be stale, or schemaless when
schema_bytes is None), pyarrow.concat_tables(..., promote_options="default")
silently introduces phantom columns filled with NULLs, or — for type mismatches —
raises.

Fix: hold any 0-row chunk aside in a local zero_row_table instead of appending it.
The concat list now only ever contains real chunks, which share a consistent schema.
If every chunk turned out to be 0-row (genuinely empty result set), fall back to
appending the held-aside placeholder so the method still returns a valid pyarrow.Table.

CloudFetchQueue itself is unchanged. The columnar fetch paths are unchanged
(they only run with ColumnQueue, whose empty slices always carry the right schema).
The SEA arrow methods are unchanged (single queue call per invocation, no concat).

How is this tested?

Added 4 regression tests in tests/unit/test_fetches.py:

  • test_fetchall_arrow_drops_mismatched_empty_placeholder — first queue returns a
    0-row placeholder with column stale_col; second returns real data with col0.
    Asserts the result has only col0. Verified to fail against the pre-fix code with
    ['stale_col', 'col0'] != ['col0'].
  • test_fetchmany_arrow_drops_mismatched_empty_placeholder — same for fetchmany_arrow.
  • test_fetchall_arrow_all_empty_returns_zero_row_table — every chunk is 0-row;
    asserts a pyarrow.Table with num_rows == 0 is returned (the held-aside fallback fires).
  • test_fetchmany_arrow_all_empty_returns_zero_row_table — same for fetchmany_arrow.

Full unit suite (pytest tests/unit -x): 765 passed, 4 skipped.

Integration / e2e to be run separately.

@jprakash-db jprakash-db deployed to azure-prod May 27, 2026 18:14 — with GitHub Actions Active
@jprakash-db jprakash-db requested a review from gopalldb May 27, 2026 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant