feat(data-branch): Branch Protect Snapshot to guard LCA history from GC#24313
Conversation
ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one. |
|
Thanks for the thorough review @aunjgr. All four blocking findings addressed in Fixes1. Concurrent-sibling race (race) — both 2. Quota (quota) — 3. SHOW RECOVERY WINDOW leak (recovery) — 4. Snapshot-name-lookup bypass (lookup) — centralised reject in NitAdded WARN log in Test updates
|
…on on unrelated drops CI regression from PR matrixorigin#24313 commit 05ffa75: the blocking-issue-#1 fix added `FOR UPDATE` to every reclaimBranchProtectSnapshots call, which runs synchronously after every plain DROP TABLE. That meant even drops on tables that have nothing to do with data branch took a pessimistic lock on the whole mo_branch_metadata table. Under the docker-compose PESSIMISTIC CI job, the `restore account` flow drops/recreates many tables back-to-back. The serialised lock created enough contention that an in-flight view restore observed its base table as `no such table` (restore view race vs concurrent drop cleanup), causing `restore account sys{snapshot=sp02}` to fail in sys_restore_view_to_sys_account.sql (324 total, 4 failed). Fix: short-circuit the reclaim path. Before taking the `FOR UPDATE` lock, run a cheap SELECT probe asking whether any of the dead tids actually participate in mo_branch_metadata (as child or as parent referenced by a child). 99%+ of drops hit the fast path and skip the full reclaim scan entirely. The frontend path (dataBranchDeleteTable/Database) is not affected because those entry points are branch-specific by construction — every deadTID is guaranteed to be in mo_branch_metadata, so a probe would be wasted work. Correctness unchanged: when the probe confirms branch involvement, the original FOR UPDATE-backed reclaim still runs and still serialises concurrent sibling drops, so blocking issue #1 remains fixed. Co-authored-by: Copilot <[email protected]>
|
Root cause of the PESSIMISTIC docker-compose CI failure: the blocking-issue-#1 fix in In the PESSIMISTIC CI job, Fix in Verified locally:
Pushed; CI re-running. |
aunjgr
left a comment
There was a problem hiding this comment.
Re-reviewed commits 24ee591b, 05ffa754, c390c13c. All four of my previously flagged blocking findings are fully resolved:
1. Concurrent-sibling race — fixed with SELECT ... FOR UPDATE on mo_branch_metadata inside the same reclaim txn, in both the compile path (pkg/sql/compile/ddl.go:2192-2195 via runSqlWithResult sharing c.proc.GetTxnOperator()) and the frontend path (pkg/frontend/data_branch_snapshot.go:887-890 in loadBranchDAGWithBH). The second sibling's table_deleted=true flip blocks on the first txn's row locks; after commit it sees fresh state and reclaims. Traced A → {B,C} and root → A → {B,C} cascades — no leak. Idempotent DELETEs make duplicate passes safe.
The fast-path probe at ddl.go:2163-2182 (no FOR UPDATE) is safe: it returns early only when no row matches any dead-tid, meaning the table isn't in any branch lineage — no snapshot exists to reclaim.
2. Snapshot quota — pkg/frontend/feature_limit.go:109-111 now includes and kind != 'branch'. Column is NOT NULL DEFAULT 'user' with a tenant-upgrade backfill for legacy rows, so no NULL-semantics hole.
3. SHOW RECOVERY WINDOW — pkg/frontend/show_recovery_window.go:192 filters and kind != 'branch'.
4. Central choke point for name lookups — getSnapshotByName rejects kind='branch' post-fetch (covers RESTORE, DROP, get_ddl, and every caller routed through it). GetSnapshotInfoByName and ResolveSnapshotWithSnapshotNameWithoutSession also add the filter. doDropSnapshot has a belt-and-suspenders pre-check via getSnapshotKindByName.
No new correctness bugs introduced. Minor note: dataBranchDeleteTable / dataBranchDeleteDatabase invoke reclaimBranchSnapshotsWithBH after the compile path has already reclaimed, making a second idempotent pass. Wasted work, not a bug — worth a follow-up cleanup if tracked.
LGTM.
XuPeng-SH
left a comment
There was a problem hiding this comment.
LGTM. Well-structured feature with strong test coverage (unit, engine, BVT) and a clear design doc.
Reviewed:
- Lifecycle correctness: Snapshot create is atomic with CLONE + mo_branch_metadata insert (same BackgroundExec txn). Reclaim fires on both frontend (DATA BRANCH DELETE) and compile-layer (DROP TABLE) paths via the shared
ReclaimBranchSnapshotsCorepipeline. - Subtree-all-deleted algorithm: Cycle-safe via
visitedset, O(N) amortised viamemo. Dangling metadata (unknown nodes) treated as deleted — correct for the catalog corruption case. - Concurrency:
FOR UPDATEon mo_branch_metadata serialises sibling reclaim paths. Fast-path probe (select 1 ... limit 1without FOR UPDATE) avoids the lock for the 99% of DROP TABLE ops that don't involve branches. The benign races between probe and FOR UPDATE scan are both safe (redundant no-op or already-handled). - Security surface:
BuildBranchSnapshotDeleteSQLonly accepts__mo_branch_<uint64>snames — no injection risk. The INSERT increateBranchProtectSnapshotinterpolates parser-validated identifiers and catalog-fetched account names — acceptable given upstream guarantees. - User surface: SHOW SNAPSHOTS filters
kind != 'branch'; DROP SNAPSHOT on a branch-managed row is explicitly rejected with a clear error;doResolveSnapshotWithSnapshotNamealso rejects branch-kind rows from restore/select paths. - No schema migration needed: Reuses existing
mo_snapshots.kindcolumn (defaults to 'user'). Clean.
CI all green. 4 prior approvals. Merge state is BEHIND but MERGEABLE — just needs a merge of main (no conflicts).
Merge Queue Status
This pull request spent 25 minutes 42 seconds in the queue, with no time running CI. Waiting for
All conditions
ReasonThe merge conditions cannot be satisfied due to failing checks HintYou may have to fix your CI before adding the pull request to the queue again. |
Merge Queue Status
This pull request spent 47 seconds in the queue, including 4 seconds running CI. Required conditions to merge
|
What type of PR is this?
Which issue(s) this PR fixes:
issue #23751
What this PR does / why we need it:
Introduces Branch Protect Snapshot, a system-managed
kind='branch'rowin
mo_catalog.mo_snapshotsthat pins parent-side history for the exactduration a branch subtree is alive.
Motivation: after a flush + global checkpoint + disk-cleaner GC cycle, the
LCA probe in
pkg/frontend/data_branch_hashdiff.go(which time-travels toclone_ts(child)on the parent) can silently downgrade anUPDATEinto anINSERTonce GC reclaims the pre-branch object of the probed row. Thereproduction is captured in
test/distributed/cases/git4data/branch/diff/diff_9.sqlCase 3.
The new snapshot feeds the branch timestamp into the TAE GC retention engine
(
pkg/vm/engine/tae/logtail/snapshot.go), so the backing objects stay ondisk as long as any branch descendant needs them.
Lifecycle
DATA BRANCH CREATE TABLE/DATABASE→updateBranchMetaTablecreateBranchProtectSnapshot(atomic with the CLONE DDL andmo_branch_metadatainsert)DATA BRANCH DELETE TABLE/DATABASEreclaimBranchSnapshotsWithBHDROP TABLE/DROP DATABASEcascade(*Compile).reclaimBranchProtectSnapshotsusing arunSQLclosuremo_branch_metadatadatabranchutils.ReclaimBranchSnapshotsCoreAn edge
(parent, child, clone_ts)is reclaimed iff the entire subtreerooted at
childis deleted — sibling and grand-descendant branches keepthe snapshot alive as long as any of them references it.
User surface
SHOW SNAPSHOTSfilters outkind = 'branch'rows.DROP SNAPSHOT __mo_branch_<tid>is rejected with "managed by data branch".DATA BRANCH CREATE ... TO ACCOUNT <b>anchors the snapshoton the parent's account, so GC retention applies where the parent's
objects actually live; reclaim runs as sys to reach across accounts.
No schema migration
Reuses the existing
mo_snapshots.kindcolumn (defaults to'user'). Nobackfill for pre-existing branches — their LCA history has already been GC'd
and a snapshot at the original
clone_tswould not resurrect it. Documentadvises
DROPand recreate for affected branches.Test coverage
Unit tests (9/9 pass)
pkg/frontend/data_branch_snapshot_test.go— sname format, DAG builder,subtreeAllDeletedon linear / branching DAGs, reclaim core drop-list,ancestor walk, dangling metadata handling,
doDropSnapshotbranch-kindrejection,
buildShowSnapShotsfilter.Engine tests (7/7 pass)
pkg/vm/engine/test/branch_protect_snapshot_test.go— create, reclaim ondata branch delete, reclaim on plainDROP TABLE, cascaded subtree rule,cross-account create, cross-account drop-source-first, create-failed-
rolls-back.
BVT (10 cases, 194 queries, 100% pass in 3 consecutive runs)
test/distributed/cases/git4data/branch/protect/protect_1..10.{sql,result}:DATA BRANCH DELETE TABLEDROP TABLEDATA BRANCH CREATE/DELETE DATABASEbatch create+reclaimDROP DATABASEcascade reclaimTO ACCOUNT <b>round-tripGC → diff regression (3/3 pass, 59 queries)
test/distributed/cases/git4data/branch/diff/diff_9.sqltightened with anew assertion that an update on a pre-branch PK (the exact shape of the
bug) is still classified as
t1 UPDATEafter GC — this is the end-to-endverification that the feature actually fixes the reported bug.
Special notes for your reviewer:
cloneReceiptgrows three fields (srcTableID,dstTableID,srcAccountName) so the snapshot insert can reuse the IDs thatupdateBranchMetaTablealready resolves.(*Compile).reclaimBranchProtectSnapshotsis the single hook pointadded in
pkg/sql/compile/ddl.go; it runs synchronously after theUPDATE mo_branch_metadata SET table_deleted = truestatement.docs/design/data_branch_protect_snapshot.md.